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Programming  in  VLSI 

From  Communicating  Processes 
to  Delay-Insensitive  Circuits 

l 

Alain  J.  Martin 

California  Institute  of  Technology 


Delays  have  dangerous  ends. 
—William  Shakespeare 


Introduction 

With  chip  size  reaching  one  million  transistors,  the  complexity  of  VLSI  algo¬ 
rithms  — i.e.,  algorithms  implemented  as  digital  VLSI  circuits—  is  approach¬ 
ing  that  of  software  algorithms —  i.e.,  algorithms  implemented  as  code  for 
a  stored-program  computer.  Yet  design  methods  for  VLSI  algorithms  lag  far 
behind  the  potential  of  the  technology. 

Since  a  digital  circuit  is  the  implementation  of  a  concurrent  algorithm, 
we  propose  a  concurrent  programming  approach  to  digital  VLSI  design.  The 
circuit  to  be  designed  is  first  implemented  as  a  concurrent  program  that  ful¬ 
fills  the  logical  specification  of  the  circuit.  The  program  is  then  compiled 
—manually  or  automatically—  into  a  circuit  by  applying  semantic-preserving 
program  transformations.  Hence,  the  circuit  obtained  is  correct  by  construc¬ 
tion. 

The  main  obstacle  to  such  a  method  is  finding  an  interface  that  provides  a 
good  separation  of  the  physical  and  algorithmic  concerns.  Among  the  phys- 
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ical  parameters  of  the  implementation,  timing  is  the  most  difficult  to  isolate 
from  the  logical  design,  because  the  timing  properties  of  a  circuit  are  essen¬ 
tial  not  only  to  its  real-time  behavior,  but  also  to  its  logical  correctness  if  the 
usual  synchronous  techniques  are  used  to  implement  sequencing. 

For  this  reason,  delay-insensitive  techniques  are  particularly  attractive  for 
VLSI  synthesis.  A  circuit  is  delay-insensitive  when  its  correct  operation  is 
independent  of  any  assumption  on  delays  in  operators  and  wires  except  that 
the  delays  be  finite  [17].  Such  circuits  do  not  use  a  clock  signal  or  knowledge 
about  delays. 

Let  us  clarify  a  matter  of  definitions  right  away:  The  class  of  entirely  delay- 
insensitive  circuits  is  very  limited.  Different  asynchronous  techniques  distin¬ 
guish  themselves  in  the  choice  of  the  compromises  about  delay-insensitivity. 

Speed-independent  techniques  assume  that  delays  in  gates  are  arbitrary, 
but  that  there  are  no  delays  in  wires.  Self-timed  techniques  assume  that  a 
circuit  can  be  decomposed  into  equipotential  regions  inside  which  wire  de¬ 
lays  are  negligible  [16].  In  our  method,  certain  local  “forks”  are  introduced  to 
distribute  a  variable  as  inputs  of  several  operators.  We  assume  that  the  differ¬ 
ences  in  delays  between  the  branches  of  the  fork  are  shorter  than  the  delays 
in  the  operators  to  which  the  fork  is  an  input.  We  call  such  forks  isochronic 
[6]. 

Although  we  initially  chose  delay-insensitive  techniques  for  reasons  of 
methodology,  those  techniques  present  other  important  advantages  in  terms 
of  efficiency  and  robustness: 

The  clock  rate  of  a  synchronous  design  has  to  be  slowed  to  account  for 
the  worst-case  clock  skews  in  the  circuit  and  for  the  slowest  step  in  a 
sequence  of  actions.  Since  delay-insensitive  circuits  do  not  use  clocks, 
they  are  potentially  faster  than  their  synchronous  equivalents. 

Since  the  logical  correctness  of  the  circuits  is  independent  of  the  values 
of  the  physical  parameters,  delay-insensitive  circuits  are  very  robust  to 
variations  of  these  parameters  caused  by  scaling  or  fabrication,  or  by 
some  nondeterministic  behavior  such  as  the  metastability  of  arbiters. 
For  instance,  all  the  chips  we  have  designed  have  been  found  to  be 
functional  in  a  range  of  voltage  values  (for  the  constant  voltage  level 
encoding  the  high  logical  value)  from  above  10V  to  below  IV. 

Delay-insensitive  circuit  design  can  be  modular:  A  part  of  a  circuit  can 
be  replaced  by  a  logically  equivalent  one  and  safely  incorporated  into 
the  design  without  changes  of  interfaces. 

Because  an  operator  of  a  delay-insensitive  circuit  is  “fired”  only  when 
its  firing  contributes  to  the  next  step  of  the  computation,  the  power 
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consumption  of  such  a  circuit  can  be  much  lower  than  that  of  its  syn¬ 
chronous  equivalent. 

Since  the  correctness  of  the  circuits  is  independent  of  propagation  de¬ 
lays  in  wires  and,  thus,  of  the  length  of  the  wires,  the  layout  of  chips 
is  facilitated. 

The  method  indeed  produces  correct  and  efficient  circuits.  It  has  been  ap¬ 
plied,  with  both  "hand”  compilation  and  automatic  compilation,  to  a  series 
of  difficult  design  problems  such  as  distributed  mutual  exclusion,  fair  arbi¬ 
tration,  routing  automata,  stacks,  and  serial  multipliers.  All  fabricated  chips 
have  been  found  to  be  correct  on  “first  silicon”.  Although  our  CMOS  imple¬ 
mentation  of  the  basic  operators  has  been  overly  cautious,  and  the  electrical 
optimization  techniques  have  been  rather  tame,  the  performance  of  the  chips 
has  been  found  at  least  equal  to  that  of  synchronous  implementations.  We 
have  just  completed  the  design  of  a  general-purpose  microprocessor,  and  its 
performances  are  very  encouraging:  In  1.6pm  SCMOS,  it  runs  at  18  million 
instructions  per  second.  (See  the  conclusion,  Section  23,  for  more  detail.) 

The  main  reason  for  the  efficiency  of  the  method  is  that,  rather  than  going 
in  one  step  from  program  to  circuit,  the  designer  applies  a  series  of  transfor¬ 
mations  to  the  original  program.  At  each  stage,  powerful  algebraic  manipula¬ 
tions  can  be  performed  leading  to  important  optimizations  in  terms  of  speed 
or  area. 

In  the  first  part  of  this  chapter,  we  present  the  “source  code”  notation,  the 
“object  code”  notation,  and  a  VLSI  implementation  of  the  production  rules  in 
CMOS  technology.  The  source  notation  is  inspired  by  C.  A.  R.  Hoare’s  CSP  [4]:  A 
program  is  a  set  of  concurrent  processes  communicating  by  input  and  output 
commands  on  channels.  (A  similar  experience  in  the  use  of  communicating 
processes  for  programming  in  VLSI  is  described  in  [13].)  The  object  code 
notation,  called  production  rule  set,  is  one  of  the  main  innovations  of  the 
method  and  is  an  interesting  notation  for  digital  VLSI  all  by  itself. 

In  the  second  part,  we  describe  the  four  main  steps  of  the  compilation 
(process  decomposition,  handshaking  expansion,  production  rule  expansion, 
operator  reduction),  illustrating  them  with  a  number  of  examples.  In  partic¬ 
ular,  we  present  the  different  algebraic  transformations  that  can  be  applied 
at  different  stages  of  the  compilation  and  that  give  the  method  its  flexibility 
and  efficiency. 
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Part  I:  The  Source  Code  and  the  Object  Code 

1  The  Program  Notation 

For  the  sequential  part  of  the  notation,  we  use  a  subset  of  Edsger  W.  Dijk- 
stra’s  guarded  command  language  [3],  with  a  slightly  different  syntax.  We  give 
only  an  informal  definition  of  the  constructs'  semantics. 

(i)  f>T  stands  for  b  :=  true,  bl  stands  for  b  :=  false.  Those  assignments  are 
called  “simple  assignments”. 

(ii)  The  execution  of  the  selection  command  [Gi  —  Sifl  . . .  D  G„  —  S„],  where 
G\  through  G„  are  boolean  expressions,  and  Si  through  S„  are  program 
parts  (G,  is  called  a  “guard",  and  G,  —  S,  a  “guarded  command”),  amounts 
to  the  execution  of  an  arbitrary  S;  for  which  G,  holds.  If  -i(Gi  v  . . .  v  C„) 
holds,  the  execution  of  the  command  is  suspended  until  (Gi  v  . . .  v  G„) 
holds. 

(iii)  The  execution  of  the  repetition  command  *[G]  —  SiB  ...fl  -*  S„], 

where  Gi  through  C„  are  boolean  expressions,  and  Si  through  S„  are 
program  parts,  amounts  to  repeatedly  selecting  an  arbitrary  Si  for  which 
Gj  holds  and  executing  S„  If  -.(Gi  v...vG„)  holds,  the  repetition  termi¬ 
nates. 

(iv)  Sequencing:  Besides  the  usual  sequential  composition  operator  ‘x;  / ,  we 
introduce  two  other  operators.  For  atomic  actions  x  and  y,  x,  y1  stands 
for  the  execution  of  x  and  y  in  any  order  leading  to  termination.  For 
noninterfering  communication  actions  x  and  y,  “x  •  y”  stands  for  the 
simultaneous  execution  of  x  and  y.  (We  shall  return  to  this  definition 
when  we  discuss  the  implementation  of  communication  in  Section  19.) 

(v)  [G],  where  C  is  a  boolean  expression,  stands  for  [G  — »  skip]  and  thus 
for  “wait  until  C  holds".  (Hence  “[G];  S’  and  [G  -  S]  are  equivalent.) 

(vi)  *[S]  stands  for  *[true  —  SJ  and  thus  for  “repeat  S  forever". 

(vii)  From  (ii)  and  (iii),  the  operational  description  of  the  statement 

*[[Ci  —  SiB  ...D  G„-sn]] 

is  “repeat  forever:  wait  until  some  G,  holds;  execute  an  S,  for  which  G, 
holds". 

(viii)  Tail  recursion  is  allowed,  but  not  general  recursion.  Functions  and  pro¬ 
cedures  with  a  simple  parameter  mechanism  are  also  used,  but  we  will 
not  discuss  them  here. 
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1.1  Communicating  Processes 

A  concurrent  computation  is  described  as  a  set  of  processes  composed  by 
the  usual  concurrent  composition  operator  ||.  The  concurrent  composition  is 
weakly  fair;  i.e.,  if,  in  a  given  state  of  the  computation,  x  is  the  next  atomic 
action  of  one  of  the  processes,  then  x  will  be  executed  after  a  possibly  un¬ 
bounded  but  finite  number  of  atomic  actions  from  other  processes. 

Processes  communicate  by  communication  actions  on  ports;  they  do  not 
share  variables.1  A  port  of  a  process  is  paired  with  a  port  of  another  process 
to  form  a  channel.  When  no  messages  are  transmitted,  communication  on 
a  port  is  reduced  to  synchronization  signals.  The  name  of  the  port  is  then 
sufficient  to  identify  a  communication  action. 

If  two  processes,  pi  and  p2,  share  a  channel  with  portX  in  pi  and  port  Y 
in  p2,  at  any  time  the  number  of  completed  X-actions  in  p\  equals  the  num¬ 
ber  of  completed  Y-actions  in  p2.  In  other  words,  the  completion  of  the  nth 
X-action  “coincides”  with  the  completion  of  the  nth  Y-action.  If,  for  example, 
pi  reaches  the  nth  X-action  before  p2  reaches  the  nth  Y-action,  the  comple¬ 
tion  of  X  is  suspended  until  p2  reaches  V.  The  X-action  is  then  said  to  be 
pending.  When,  thereafter,  p2  reaches  Y,  both  X  and  Y  are  completed.  The 
predicate  “X  is  pending"  is  denoted  as  qX.  If,  for  an  arbitrary  command  A,  c A 
denotes  the  number  of  completed  A-actions,  the  semantics  of  a  pair  (X,  Y)  of 
communication  commands  is  expressed  by  the  two  axioms: 

cX  =  cY  (Al) 

iqX  v  -.qY  (A2) 

Surprisingly,  it  is  possible  (and  even  advantageous)  to  define  communica¬ 
tion  actions  as  coincident  and  yet  implement  the  actions  in  completely  asyn¬ 
chronous  ways. 

1.2  Probe 

Instead  of  the  usual  selection  mechanism  by  which  a  set  of  pending  commu¬ 
nication  actions  can  be  selected  for  execution,  we  provide  a  general  boolean 
command  on  ports,  called  the  probe.  The  definition  of  the  probe  given  in  [5] 
states  that  in  process  pi,  the  probe  command  X  has  the  same  value  as  qY. 
For  the  time  being,  we  use  a  weaker  definition,  namely: 

X  =*  qY 
qY  =>  ©X, 


1.  We  have  made  a  restricted  use  of  shared  variables  in  the  design  of  the  microprocessor. 
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where  oP  means  P  holds  eventually.  (We  will  return  to  the  first  definition  in 
the  example  on  the  implementation  of  a  fair  arbiter.) 


1.3  Communication 

Matching  communication  actions  are  also  used  to  implement  a  form  of  dis¬ 
tributed  assignment  statement,  to  “pass  messages’,  as  it  is  often  said.  In  that 
case,  the  pair  of  commands  is  specified  to  consist  of  an  input  command  and 
an  output  command  by  adjoining  them  to  the  symbols  "?"  and  “!’,  respec¬ 
tively.  For  example,  X?  is  an  input  command  and  X  is  therefore  an  input  port, 
and  T!  is  and  output  command  and  Y  is  therefore  and  output  port. 

Axiom  Communication  axiom 

LetX?u  and  7!  vbe  matching,  where  u  is  a  process  variable  and  v  is  an  expres¬ 
sion  of  the  same  type  as  u.  The  communication  implements  the  assignment 
u  :=  v.  In  other  words,  if  v  =  V  before  the  communication,  then  u  =  V  and 
v  =  V  after  the  communication. 

1.4  First  Example:  Port  Selection 

Process  sel  repeatedly  performs  communication  action  X  or  communication 
action  Y,  whichever  can  be  completed;  sel  is  blocked  if  and  only  if  neither  X 
nor  Y  can  be  completed: 

se/=  *{[X-*  XOV-V]]. 

Obviously,  process  sel  is  not  fair  because  of  the  nondeterministic  choice 
of  a  guard  when  both  guards  are  true.  Negated  probes  make  it  possible  to 
transform  sel  into  a  fair  version,  fsel: 

fsel  m  *[[  X-*  X;  [?  — >  V  D  ",y  ”*  skip] 

D  Y-+Y-,  [X  -*  X  0  iX  -*  skip] 

11- 


Negated  probes  are  necessary  for  implementing  fairness. 

1.5  Second  Example:  Lazy  Stack 

We  implement  a  stack  S  of  size  n,  n  >  0.  as  a  string  of  n  communicating 
processes  defined  as  follows: 
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h,  if  n  -  1, 

S  =  - 

(h\\T),  if  n  >  1, 

where  h,  the  head  of  the  stack,  is  a  process,  and  T,  the  tail  of  the  stack, 
is  a  stack  of  size  n-  1.  Process  h  communicates  with  the  environment  of 
the  stack  by  the  communication  actions  in?x  and  out !x,  and  with  T  by  the 
communication  actions  put\x  and  get?x.  Hence,  h.put  matches  T.in,  and  h.get 
matches  T.out.  (We  assume  that  no  attempt  is  ever  made  to  add  a  portion  to 
a  full  stack,  or  to  remove  a  portion  from  an  empty  stack.) 

Each  stack  element  either  is  empty  and  behaves  like  program  E,  or  is  full 
and  behaves  like  program  F.  The  epithet  “lazy"  is  attributed  to  this  stack  be 
cause  no  reshuffling  of  portions  takes  place  after  a  portion  has  been  removed 
from  a  full  stack  element. 

E  e  [Tn-in?x\F 

flout  —  get?x;  out !x;  E 

1 

F  ~  [out-out!x;F 

flTn  —  put\x\  in7x\  F 


The  following  alternative  coding  of  the  stack  element  process,  due  to  Peter 
Hofstee,  illustrates  the  advantages  of  the  probe  construct; 

*[[Trt  —  in?x 
D  out  —  getlx 


[out—  out  lx 
flln-  put\x 
11. 

We  assume  that  each  stack  element  is  initially  empty. 

2  The  Object  Code:  Production  Rules 

Carrying  the  discrete  model  of  computation  down  to  the  transistor  level  re¬ 
quires  that  the  MOS  transistor  be  idealized  as  an  on/off  switch.  Unfortunately, 
the  simple  semantics  of  the  switch  ignore  too  many  electrical  phenomena 


8 


Chapter  1  Martin:  Programming  in  VLSI 


that  play  an  important  role  in  the  functioning  of  the  circuit.  A  crucial  inno¬ 
vation  of  the  method  is  that  the  transistor  need  not  be  viewed  as  a  discrete 
switch;  voltages  can  change  continuously  from  one  stable  level  to  the  other 
one,  provided  that  the  changes  are  monotonic. 

The  notation  for  the  object  code  provides  the  weakest  possible  form  of 
control  structure  and  the  smallest  possible  number  of  program  constructs, 
in  fact,  it  contains  exactly  one  construct,  the  production  rule  (PR),  and  one 

control  structure,  the  production-rule  set. 

We  consider  the  production-rule  notation  to  be  the  canonical  representa¬ 
tion  of  a  digital  circuit.  This  representation  can  be  decomposed  into  several 
equivalent  networks  of  digital  operators,  depending  on  the  set  of  building 
blocks  used,  but  the  production-rule  set  represents  the  circuit  independently 
of  the  chosen  implementation. 

Definition  A  PR  is  a  construct  of  the  form  GmX,  where  S  is  either  a 
simple  assignment  or  an  unordered  list  “si .  s2,  s3, . . . "  of  simple  assignments, 
and  G  is  a  boolean  expression  called  the  guard  of  the  PR. 

Example 

x  a  y>-*  z] 

u|,Vi 

The  semantics  of  a  PR  are  defined  only  if  the  PR  is  stable. 

Definition  A  PR  G  <—  S  is  said  to  be  stable  in  a  given  computation,  if,  at 
any  point  of  the  computation,  G  either  is  false  or  remains  invariant^  true 
until  the  completion  of  5. 

Stability  is  not  guaranteed  by  the  implementation.  It  has  to  be  enforced  by 
the  compilation  procedure. 

Definition  An  execution  of  the  stable  PR  G  ->  S  is  an  unbounded  se¬ 
quence  of  firings.  A  firing  of  G  ~  S  with  G  true  amounts  to  the  execution  of 
S.  A  firing  of  G  <-  S  with  G  false  amounts  to  a  skip. 

Definition  A  PR  set  is  the  concurrent  composition  of  all  PRs  of  the  set. 


2.1  Operations  on  PR  Sets 

The  only  composition  operation  on  two  PR  sets  is  the  set  union. 
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Theorem 

The  implementation  of  two  concurrent  processes  is  the  set  union  of  the  two 
PR  sets  implementing  the  processes  and  of  the  PR  sets  implementing  the 
channels  between  the  processes,  if  any. 

The  proof  follows  from  the  associativity  of  the  concurrent  composition 
operator. 

The  other  operations  on  the  PRs  of  a  set  are  those  allowed  by  the  following 
properties: 

Multiple  occurrences  of  the  same  PR  are  equivalent  to  one  as  a  conse¬ 
quence  of  the  idempotence  of  the  concurrent  composition. 

The  two  rules  G  <-*  SI  and  G  «-*  52  are  equivalent  to  the  single  rule 
G  »-♦  S1,S2. 

The  two  rules  G1  •-  S  and  G2  5  are  equivalent  to  the  single  rule 
Cl  v  C2  ~  S. 

2.2  Noninterference 

We  require  that  complementary  PRs  — i.e.,  PRs  of  the  type  G1  •-*  xt  and  G2  *-> 
xi—  be  noninterfering. 

Definition  Two  complementary  PRs  are  noninterfering  when  iGl  v  -iG2 
holds  invariantly. 

It  can  be  proven  that,  under  the  stability  of  each  PR  and  noninterference 
among  complementary  PRs,  the  concurrent  execution  of  the  PRs  of  a  set  is 
equivalent  to  the  following  sequential  execution: 

*[select  a  PR  with  a  true  guard ;  fire  the  PR] 

where  the  selection  is  weakly  fair  (each  PR  is  selected  infinitely  often).  From 
now  on,  we  ignore  the  firings  of  a  PR  with  a  false  guard;  a  firing  will  mean  a 
firing  of  a  PR  with  a  true  guard. 

Until  we  return  to  these  issues,  we  shall  assume  that  the  stability  and 
noninterference  requirements  are  fulfilled. 

3  VLSI  Implementation  of  PRs 

Stability  and  noninterference  are  the  two  properties  that  make  the  VLSI 
implementation  of  PRs  (almost)  straightforward.  As  an  example,  we  describe 
how  PRs  can  be  implemented  in  CMOS  technology. 
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3.1  The  CMOS  Transistors 

A  CMOS  circuit  is  a  network  of  “nodes"  —variables—  interconnected  by  tran¬ 
sistors.  Certain  nodes  are  also  connected  to  the  input-output  “pads  ,  which 
provide  the  interface  with  the  environment;  we  will  ignore  the  pads  in  this 
presentation.  Other  nodes  are  directly  connected  to  the  power  node,  provid¬ 
ing  the  constant  high-voltage  value  — called  VDD—  that  represents  the  logical 
constant  true  or  1.  Yet  other  nodes  are  directly  connected  to  the  ground  node 
—called  GND—  providing  the  constant  low-voltage  value  that  represents  the 
logical  constant  false  or  0. 

A  node  takes  the  continuous  range  of  voltage  values  between  the  high  volt¬ 
age  and  the  low  voltage.  Above  a  certain  voltage  vl  the  value  is  interpreted 
as  1.  Below  another  voltage  vO,  the  value  is  interpreted  as  0.  Thanks  to  the 
stability  property,  the  precise  values  of  vl  and  vO,  which  vary  from  node 
to  node,  are  irrelevant  provided  that  vO  <  vl  and  the  voltage  changes  are 
monotonic.  (Strict  monotonicity  is  not  necessary  and  is  actually  impossible 
to  achieve  because  of  noise,  but  we  will  not  enter  into  these  details  here.) 

A  CMOS  transistor  is  of  either  n-type  or  p-type.  A  transistor  relates  three 
nodes  in  the  following  way.  Let  g,  standing  for  “gate",  and  x  and  y  be  the 
three  nodes.  When  g  is  false  for  an  n- transistor,  and  true  for  a  p-transistor, 
no  current  passes  through  the  region  between  x  and  y,  called  the  channel,2 
thus  x  and  y  are  left  unchanged. 

When  g  is  set  to  true  for  an  n-transistor,  or  false  for  a  p-transistor,  the 
channel  becomes  conducting.  In  this  case,  either  x  and  y  have  the  same  volt¬ 
ages  and  are  left  unchanged,  or  a  current  is  established  in  the  channel  until 
x  and  y  reach  the  same  voltage.  The  common  value  reached  by  x  and  y  de¬ 
pends  on  electrical  properties  of  x  and  y  that  are  determined  by  the  physical 
sizes  (capacitances)  of  the  nodes  implementing  x  and  y  and  by  their  interac¬ 
tions  with  the  rest  of  the  circuit.  (Differences  in  node  capacitances  may  cause 
charges  to  flow  through  the  channel  of  a  transistor  in  a  way  that  results  in 
unintended  values  of  the  nodes.  This  phenomenon,  called  charge  sharing, 
may  make  it  quite  difficult  to  predict  the  final  voltage  value  reached  by  x  and 

y-) 

In  order  to  define  the  net  effect  of  a  PR  independently  of  the  physical  pa¬ 
rameters  of  its  implementation,  we  are  going  to  restrict  the  use  of  transistors. 
(In  particular,  the  restriction  will  eliminate  most  occurrences  of  charge  shar¬ 
ing.) 

We  impose  the  condition  that  a  transistor  used  in  isolation  connect  only 
two  variables  of  the  circuit:  the  gate  g  and  one  of  the  other  two  nodes,  say  z. 

2.  This  notion  of  channel  is  unrelated  to  the  one  we  introduced  for  communication  among 

processes. 
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The  third  node  of  the  transistor  is  either  the  power  or  the  ground.  With  this 
restriction,  the  behavior  of  a  single  n-transistor  is 

or  <7  *->  z  1 . 

The  behavior  of  a  single  p-transistor  is 
->g  >-»  z  T  or  ig  •—  z  \ . 


3.2  Threshold  Voltages 

The  current  in  the  channel  of  a  transistor  is  a  function  of  the  so-called  gate- 
to-source  voltage,  Vgsi  defined  as  V(g)-min(V(x),  V(y))  for  an  n-transistor  and 
as  V(g)-max{V(x),  V(y))  for  a  p-transistor.  In  first  approximation,  the  current 
is  assumed  to  be  zero  when 


VgS  <  Vtn 

for  an  n-transistor  and 


Vgs  —  Vtp 

for  a  p-transistor.  V,„  and  Vtp  are  called  the  threshold  voltages.  (Typically, 
Vt„  «  1 V  and  Vlp  --1V.) 

Because  of  the  existence  of  threshold  voltages,  if  an  n-transistor  is  used 
to  implement  g*-*  z  |,  the  final  value  of  z  is  not  a  “strong"  1,  since  the  chan¬ 
nel  will  stop  conducting  as  soon  as  the  voltage  of  z  is  within  V,„  of  the  gate 
voltage.  And  symmetrically,  a  p-transistor  used  to  implement  ->g  •—  z  1  does 
not  produce  a  “strong”  zero  as  the  final  value  of  z.  Since  the  voltage  drops 
caused  by  the  threshold  voltages  accumulate  as  we  compose  operators,  it  is 
important  to  produce  strong  signals  in  order  to  be  able  to  compose  an  arbi¬ 
trary  number  of  operators.  We  shall  therefore  restrict  our  use  of  n-transistors 
to  PRs  of  the  form 


g->zl 


(1) 


and  p-transistors  to  production  rules  of  the  form 


With  these  restrictions,  all  implementations  produce  strong  signals. 

Threshold  voltages  are  difficult  to  adjust  in  CMOS  technology.  Actually, 
they  tend  to  become  more  variable  as  the  feature  size  decreases.  (They  may 
also  vary  during  the  activity  of  the  circuit  because  of  some  electrical  inter¬ 
action  with  the  substrate,  called  body  effect.)  For  constant  node  capacitance, 
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variations  in  thresholds  account  for  most  of  the  discrepancies  in  propagation 
delays  on  a  CMOS  chip.  In  particular,  these  variations  exclude  the  possibility 
that  the  ordering  in  space  of  a  set  of  variables  along  a  common  wire  be  used 
to  infer  an  ordering  in  time  of  a  set  of  transitions  of  these  variables. 

3.3  Switching  Circuits 

Consider  the  canonical  (stable)  PR 

b~zl.  (3) 

where  b  is  a  boolean  expression  in  terms  of  a  set  of  variables.  These  vari¬ 
ables  are  used  as  gates  of  transistors  implementing  a  switching  circuit  s  cor¬ 
responding  to  fa:  s  is  a  series-parallel  switching  circuit  between  the  ground 
node  and  z.  The  switches  are  n-transistors  whose  gates  are  the  variables  of 
fa,  possibly  negated.  Furthermore,  we  have 

fa  =  ‘there  is  a  path  from  ground  to  z  in  s’. 

By  the  construction  of  s,  if  fa  holds  and  remains  stable,  z  is  eventually  set 
to  0.  (For  this  reason,  s  is  called  a  pull-down  circuit .)  Hence,  s  is  exactly  the 
implementation  of  production  rule  (3). 

Using  a  symmetrical  argument,  we  can  show  that  the  same  series-parallel 
circuit  as  s,  but  with  the  power  node  and  z  connected,  and  whose  switches 
are  p-transistors,  implements  the  production  rule 

14) 

bneg  •—  z| , 

where  bneg  is  derived  from  fa  by  negating  all  variables.  (This  circuit  is  called 
a  pull-up  circuit.) 

4  Operators 

Two  PRs  that  set  and  reset  the  same  variable,  such  as 

fal  i-  zT  (5) 

fa2  *-»  zl, 

are  implemented  as  one  operator. 

Let  si  be  the  pull-up  circuit  corresponding  to  bl,  and  let  s2  be  the  pull¬ 
down  circuit  corresponding  to  fa2.  The  two  circuits  are  connected  through 
the  common  node  z  (see  Figure  1).  Since  noninterference  has  been  enforced, 
-ibl  v-ib2  holds  at  any  time.  This  guarantees  the  absence  of  a  conducting  path 
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between  power  and  ground  when  the  operator  is  not  firing.  (A  path  may  exist 
for  a  short  time  when  the  operator  is  firing.) 


Definition  The  operator  implementing  the  two  rules  is  called  combma 
tional"  if  hi  v  b2  holds  at  any  time,  and  “state-holding'  otherwise. 


By  definition,  if  (5)  is  combinational,  there  is  always  a  conducting  path 
between  either  VDD  or  GND  and  the  output  z.  Hence,  the  value  of  the  output 
is  always  a  strong  0  or  a  strong  1.  and  therefore  si  and  *2  are  together  a  valid 

implementation  of  (5). 

For  example,  PRs  (1)  and  (2)  together  implement  an  inverter  as  represented 
in  Figure  2.  The  circuit  of  Figure  3  implements  the  ncmd-operator  defined  by 

the  PRs 


a  Ab 
-i a  v  ~ib 


z\ 
zt . 


If  (5)  is  a  state-holding  operator,  nbl  a  -ib2  may  hold  in  a  certain  state_  In 
such  a  state,  node  z  is  isolated;  there  is  no  path  between  z  and  e.therVDD  or 
GND.  In  MOS  technology,  an  isolated  node  does  not  retain  its  value  foreve  ■ 
eventually  the  charges  leak  away  through  the  substrate  and  also  through  the 
transistors  of  the  pull-up  and  pull-down  circuits.  If  the  PRs  of  the  operator  are 
fired  frequently  enough  to  prevent  leakage,  the  implementation  of  Figure  1 
can  be  used  for  a  state-holding  operator.  Such  an  implementation  is  called 

dynamic. 


Figure  1.  CMOS  implementation  of  a  combinational  operator. 
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Otherwise,  it  is  necessary  to  add  a  storage  element  to  the  output  node 
of  a  state-holding  operator.  Such  an  implementation  is  called  static.  In  the 
sequel,  we  assume  that  only  static  implementations  are  used  for  state-holding 
operators. 

(A  standard  CMOS  implementation  of  such  a  storage  element  consists  of 
two  cross-coupled  inverters  (see  Figure  4).  This  implementation  inverts  the 
value  of  z.  The  “weak"  inverter,  marked  with  a  letter  w  on  the  figure,  connects 
z  to  either  VDD  or  GND  through  a  high  resistance,  so  as  to  maintain  z  at  its 
intended  voltage  value  [18].) 

The  implementation  of  a  static  state-holding  operator  is  slightly  more 
costly  than  that  of  a  combinational  operator  because  of  the  need  for  a  storage 
device.  Hence,  given  a  pair  of  PRs  that  are  not  combinational,  we  may  first 
try  to  modify  the  guards  —under  the  invariance  of  the  semantics—  so  as  to 
make  them  combinational. 


5  The  Standard  Operators 

All  operators  of  one  or  two  inputs  are  used,  and  are  therefore  viewed  as 
the  standard  operators. 


Figure  2.  A  CMOS  inverter. 
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5.1  One-Input  Operators 

The  two  operators  with  one  input  and  one  output  are  the  wire: 
xwy  =  x  •— »  yi 

-ix  -*  yi. 

and  the  inverter. 

nxwy  *  -ix  •-»  yt 
x  ~  yi  . 

Most  operators  we  use  have  more  inputs  than  outputs.  In  general,  however, 
the  components  we  design  have  as  many  outputs  as  inputs.  Hence,  we  need 
to  reset  the  balance  by  introducing  at  least  one  operator,  the  fork,  with  more 
outputs  than  inputs.  A  fork  with  two  outputs  is  defined  as 

xf(y,z)  =  x  *—  yt,zt 
ix  ~yl,zl  . 

The  wire  and  the  fork  are  the  only  two  operators  that  are  implemented  not 
as  a  pull-up/pull-down  circuit  —called  a  restoring  circuit—  but  as  a  simple 
conducting  interconnection  between  input  and  outputs. 

Figure  3.  CMOS  implementation  of  a  nand- gate. 
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5.2  The  Wire  as  a  Renaming  Operator 

Because  the  implementation  of  a  wire  is  the  same  as  that  of  a  node,  the  wire 
behaves  as  a  renaming  operator  when  composed  with  another  operator:  The 
composition  of  an  arbitrary  operator  0  with  output  variable  x  with  the  wire 
x  wy  is  equivalent  to  O  in  which  x  is  renamed  y.  The  composition  of  operator 
O  with  input  variable  x  with  the  wire  y  wx  is  equivalent  to  O  in  which  x  is 
renamed  y.  (Observe  that  O  can  even  be  a  wire.) 

Unfortunately,  the  fork  is  not  a  renaming  operator  since  the  concurrent 
assignments  to  the  different  outputs  of  the  fork  are  not  completed  simulta¬ 
neously.  In  order  to  use  a  fork  as  a  renaming  operator,  we  will  later  have  to 
make  the  timing  assumption  that  such  a  fork  is  isochronic. 

5.3  Combinational  Operators  with  Two  Inputs 

We  construct  all  functions  B  of  two  variables  x  and  y  such  that 

B  — *  z| 

-i  B  •—  z  l  . 

We  get  for  B:  x  a  y,  x  v  y,  and  x  =  y.  We  will  not  list  the  functions  obtained  by 
inverting  inputs  of  B.  (In  the  figures,  a  negated  input  or  output  is  represented 
by  a  small  circle  on  the  corresponding  line.)  This  gives  the  following  set: 


Figure  4.  A  static  implementation  of  a  state-holding  operator. 
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The  and,  with  the  infix  notation  (x,y)  az,  is  defined  as 

x  a  y  •-»  z! 

ix  v  -iy  •— >  zi  . 

The  or,  with  the  infix  notation  (x,y)vz,  is  defined  as 

xvy  •-*  zt 
-ix  a  iy  *-*■  zi  . 

The  equality,  with  the  infix  notation  ( x,y)eqz ,  is  defined  as 

x  =  y  •-*  zt 

x/y  *-*  zi  . 


5.4  State-Holding  Operators  with  Two  Inputs 

Next,  we  construct  all  different  two-input-one-output  operators  of  the  form 


b  1  —  zt 
b2  ~  zi 


such  that  -.hi  v  -*2  holds  at  any  time,  but  b  1  *  ~i>2.  We  select  for  b  1  either 
x  a  y,  or  x  v  y,  or  x  -  y.  For  each  choice  of  bl,  we  construct  b2  as  any  of  the 
effective  strengthenings  of  iM. 


For  bl  =  (x  Ay),  we  get  for  b 2:  *ix  a -.y,  -«x  a y,  -.x,  and  x  yt  y.  The  first  three 
choices  of  b2  lead  to  the  following  state-holding  operators: 


The  C-element. 

(x, y)C_z  =  xAy  zt 

-ixAiy  ■  zi  . 

(The  C-element,  introduced  by  David  Muller,  is  described  in  [15].) 
The  switch: 

(x,y)iwz  e  xAy  ■  zt 

-ix a y  *-*  zi  . 
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The  asymmetric  C-elementr. 

(x, y) gC z  *  x  a  y  *-►  z \ 
nx  *-»  z  l  . 

For  b2  s  (x  /  y),  we  get  the  operator 

xAy  •-*  zt 
Xfty  >-*  zi  . 

If  the  stability  condition  is  fulfilled,  however,  this  operator  is  not  state¬ 
holding.  Because  of  the  stability  requirement,  the  state  in  which  -ixAiy 
holds  —the  "storage  state"—  can  be  reached  only  from  states  x  a  -iy 
and  -ix  a  y.  In  both  states,  iz  holds,  and,  therefore,  nz  holds  in  the 
storage  state.  Hence,  we  can  weaken  the  guard  of  the  second  PR  as 
(x  +  y)  v  (-,x  a  ny),  i.e.,  -.x  v  iy.  Hence,  the  operator  is  equivalent  to  the 

ond-operator  (x,y)  az. 

For  bl  =  (x  vy),  no  effective  strengthening  of  ibl  is  possible. 

For  bl  =  (x  =  y),  we  get  the  operator: 

x  =  y  zt 
xa  iy  >-*  zj  . 

If  the  stability  condition  is  fulfilled,  however,  this  operator  is  not  state¬ 
holding  for  the  same  reasons  that  the  operator  with  bl  =  x  a  y  and 
b2  2  (x  +  y)  is  not. 


5.S  Flip-Flop 

The  canonical  form  we  choose  for  the  flip-flop  is 


( K,y)ffz  &  x  —  zt 
iy  h*  zt , 


which  requires  the  invariance  of  tx  v  y  to  satisfy  noninterference.  Observe 
that  the  flip-flop  (x,y)ffz  can  always  be  replaced  with  the  C-element  (x,y)Cz, 
but  not  vice  versa. 
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6  Multi-Input  Operators 

Since  there  are  already  164  different  operators  with  three  inputs  and  one 
output,  we  shall  not  pursue  the  systematic  enumeration  that  we  started  with 
two-input  operators.  We  use  n-input  and,  or,  C-element,  whose  definitions  are 
straightforward. 

We  use  a  multi-input  flip-flop  defined  as 

(x, . Xk.Vi . Xj) niffz  m  S/i.Xj  ~  z\ 

V/:-iyi  zi 


where  (Vi :  -»Xj)  v  (Vi :  y,). 

We  also  use  the  combinational  //'-operator— sometimes  called  multiplexer- 
defined  as 

( x,y,z)i£u  =  (x  a  y)  v  (-ix  A  z)  ►—  uf 
(XAny)v(-iXAiZ)  •—  ui  . 

The  most  general  and  most  often  used  operator  is  the  generalized  C-element, 
of  which  all  other  forms  of  C-elements  are  a  special  case.  It  implements  a  pair 
of  PRs 

B1  ►-*  x| 

B2  -  xf 

in  which  B1  and  B2  are  arbitrary  conjunctions  of  elementary  terms.  (As  usual, 
the  two  guards  have  to  be  mutually  exclusive.)  For  example, 

fl  A  i?  A  ~IC  •-»  Xf 

ia  a  d  >-►  xj 

can  be  directly  implemented  with  a  generalized  C-element.  Observe  that  the 
limiting  factor  for  the  size  of  the  guards  is  not  the  number  of  inputs,  but  the 
number  of  terms  in  a  conjunction. 


7  Arbiter  and  Synchronizer 

So  far,  we  have  considered  only  PR  sets  in  which  all  guards  are  stable  and 
noninterfering.  But  we  shall  have  to  implement  sets  of  guarded  commands 
—selections  or  repetitions—  in  which  the  guards  are  not  mutually  exclusive, 
as  in  the  probe-selection  example.  Therefore,  we  need  at  least  one  operator 
that  provides  a  nondeterministic  choice  between  two  true  guards. 
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7.1  Arbiter 

The  simplest  selection  between  nonexclusive  guards  is  of  the  form 


*[[x—  •  •  • 

D  y—  •  •  • 

]]. 

where  x  and  y  are  simple  boolean  variables,  and  the  two  guards  are  stable.  In 
order  to  distinguish  among  the  three  basic  states  of  the  system — i.e.,  neither 
x  nor  y  is  selected,  x  is  selected,  or  y  is  selected—  we  must  introduce  two 
outputs,  say  u  and  v,  as  follows: 

*[[x—  ut;  •  ■  • 

Dy- vt; 

11- 

Initially,  ^  Any  holds  as  coding  of  the  state  “no  selection  made".  Hence,  when 
the  selection  is  considered  completed,  which  is  just  a  matter  of  definition,  u 
and  v  should  be  set  back  to  false.  We  get 

*[[x  —  ut;  hxl; 

fC\ 

By  —  vt;  hyl:  vi 

11. 

If  -i u  a  -iv  holds  initially,  -iu  v  -iv  holds  at  any  time. 

The  preceding  program  is  a  description  of  the  operator  known  as  the  basic 
arbiter"  or  “mutual-exclusion  element,”  denoted  as  (x,  y)  grb  (u,  v).  Observe 
that  the  choice  between  the  two  guards  is  not  fair. 


7.2  Synchronizer 

When  negated  probes  are  used,  for  instance  to  implement  fairness,  we  have 
to  implement  selection  commands  with  unstable  guards.  The  synchronizer 
is  the  only  operator  that  accepts  nonstable  guards.  It  is  defined  as 

*[[b  AZ-*  ut;  [tz];  ul 

fl  ib  az— ‘  vt;  [iz]:  vi  ^ 

]]• 
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Variable  b  may  change  at  any  time  from  false  to  true,  but  both  b  and  z  remain 
true  until  u  or  v  has  changed.  Hence,  the  guard  ib  a  z  is  unstable,  whereas 
the  guard  b  Azis  stable.  As  in  the  arbiter  case,  if  m  a-iv  holds  initially,  -iuv-iv 
holds  at  any  time.  (The  synchronizer  operator  was  introduced  in  [7].) 

7.3  Implementation  and  Metastability 

The  PR  sets  for  (6)  and  (7)  necessarily  contain  unstable  rules.  The  PR  set  for 
the  “unstable  arbiter"  is 

X  A  tV  ►-*  uT 
y  a  iu  ►-»  vt 
iXVV  •-»  ul 
nyvu  <-*  v| . 

The  PR  set  for  the  “unstable  synchronizer”  is 
b  aza-iv  ut 

“lb  A  Z  A  ~IU  i-»  Vt 
TZVV  ui 

“iZVU  •—  vi  . 

The  first  two  PRs  of  the  arbiter  are  unstable  and  can  fire  concurrently.  The 
same  holds  for  the  first  two  production  rules  of  the  synchronizer:  Since  b  can 
change  from  false  to  true  at  any  time,  both  guards  may  evaluate  to  true. 

Let  us  analyze  the  PR  set  implementation  of  the  arbiter.  The  synchronizer 
case  is  very  similar.  The  state  x  Ay  a  (u  *  v)  of  the  arbiter  is  called  metastable. 
When  started  in  the  metastable  state,  with  iu  a  -iv,  the  set  of  PRs  specifying 
the  arbiter  may  produce  the  following  unbounded  sequence  of  firings: 

*K«t,vT);(ul,vJ)]. 

In  the  implementation,  nodes  u  and  v  may  stabilize  to  a  common  intermedi¬ 
ate  voltage  value  for  an  unbounded  period  of  time.  Eventually,  the  inherent 
asymmetry  of  the  physical  realization  (impurities,  fabrication  flaws,  thermal 
noise,  etc.)  will  force  the  system  into  one  of  the  two  stable  states  where  u  t  v. 
But  there  is  no  upper  bound  on  the  time  the  metastable  state  will  last,  which 
means  that  it  is  impossible  to  include  an  arbitration  device  into  a  clocked 
system  with  absolute  certainty  that  a  timing  failure  cannot  occur. 

The  spurious  values  of  u  and  v  produced  during  the  metastable  state  must 
be  eliminated  since  they  violate  the  requirement  -iuv-iv.  Hence,  we  compose 
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the  “bare"  arbiter  with  a  “filter"  taking  u  and  v  as  input  and  producing  ufand 
vf  as  "filtered  outputs’.  The  net  effect  of  the  filter  is 


uf,  vf  :=  ( u  A  -iv),  (v  A  -iu) . 


(In  the  CMOS  construction  of  the  filter  shown  in  Figure  5,  we  use  the  thresh¬ 
old  voltages  to  our  advantage:  The  channel  of  transistor  tl  is  conducting  only 
when  (u  a-iv)  holds,  and  the  channel  of  transistor  t2  is  conducting  only  when 
(v  a  iu)  holds.) 

In  delay-insensitive  design,  the  correct  functioning  of  a  circuit  containing 
an  arbiter  or  a  synchronizer  is  independent  of  the  duration  of  the  metastable 
state;  therefore,  relatively  simple  implementations  of  arbiters  and  synchro¬ 
nizers  can  be  used.  In  synchronous  design,  however,  the  implementations 
have  to  meet  the  additional  constraint  that  the  probability  of  the  metastable 
state  lasting  longer  than  the  clock  period  should  be  negligible. 
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8  Sequencing  and  Stability 

shaking  expansion-  into  a  collection  of  sequences  of  the  typ 
Se *[[wol;  t0\  Iwil;  t-il* 

._tc  between  the  wait-conditions  is  straightforward. 

a  semantically  equivalent  set  of  production  rules.  Let 

Patfei ~t, 10  <i<n} 


be  such  a  set. 

UAtstinnc  and  Definitions  For  an  arbitrary  PR  p,  P-9  and  P*  danote 
the  guard  and  the  assignment  of  h,  respect'^  The JJredtMteRto  ^ 

rs  «- * 

effective-,  otherwise,  it  is  called  vacuous. 

With  these  definitions,  the  stability  ot  a  PR  can  be  reformulated  as  follows: 

Stability  APR  p  is  stable  in  a  computation  if  and  only  if  p.g  can  be  falsifie 
only  in  states  where  R(p.a)  holds. 

™ ^hir«brCtrr«Ce8:rdns 

contai^negated probes.' Since,  as  we 

obtained  by  strengthening  the  watt-condmons  of  S.  the  stabtbW 
conditions  is  necessary  to  satisfy  the  sta  t  ity  remalns  true  at  ieast 

A  wait-condition  w  is  stable  i  once  w  1  ■  Unstable  wait-Conditions 

until  the  completion  of  the  following  a“'gn  ’  a  delU  „ith  separately 

can  be  caused  by  negated  probes  only.  Th  J is  achieved  is  given 

by  introducing  synchronizers.  (An  example  of  how  this  is 

in  Section  22.) 


8.1  Sequencing 

The  se,  P  of  PRs  implements  S  when  the  following  conditions  are  fulfilled: 
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1  Guard  strengthening:  The  guards  of  the  PRs  of  P  are  obtained  by  strength¬ 
ening  the  wait  conditions  of  S:  Vi ::  h,  -  w,  and,  in  the  ini  tia!  state,  Wo  »  ho  ■ 

2.  Segmntial  execution:  (Hi ::  b,  A  -*((,»  <  1.  U-,  «  most  one  effective  PR  can 
be  executed  at  a  time. 

3  Program-order  execution:  The  order  of  execution  of  effective  PRs  of  P  is 
the  order  specified  by  S,  called  the  program  order,  and  no  deadlock 
introduced  in  the  construction  of  P. 

As  we  shall  see  in  Part  2.  it  is  not  always  possible  to  construct,  for  a  given 
handshaking  expansion,  a  PR  set  that  satisfies  the  preceding  three  conditions. 
In  certain  cases,  the  handshaking  expansion  must  be  augmented  with  ass  g 
ments  to  new  variables,  called  state  variables.  This  transformation,  whi 
always  possible,  will  be  explained  in  Part  2. 


8.2  Acknowledgment 

Fulfilling  the  second  and  third  conditions  requires  that  for  any  two  PRs  p  : 
T1 ,  and  p'  "  ~  such  that  p  immediately  precedes  p'  in  the  program 

order, 

b'=>R(t) 

holds  in  the  states  where  p>  is  effectively  executed.  We  say  that  b>  is  the 
acknowledgment  of  t.  Hence  the  following  property: 

Acknowledgment  Property  For  a  PR  set  executed  in  program  order 
the  guard  of  each  PR  is  an  acknowledgment  of  the  immediate  y  prece 
assignment. 

We  shall  see  that  the  acknowledgment  property  is  necessary  but  not  suffi¬ 
cient  to  ensure  program-order  execution. 

We  use  two  kinds  of  acknowledgments,  depending  on  the  type  of  variable 
used  in  the  a°ssignment.  But  other  forms  of  acknowledgments  «»  be  envi¬ 
sioned.  If  t  assigns  an  internal  variable,  then  the  acknowledgment  imple 
mented  by  strengthening  b'  as  b'  a  R(t). 

For  example,  if  f  is  xt.  the  acknowledgment  is  b' ax. 

If  t  assigns  an  external  variable,  i.e.,  a  variable  that  implements  a  commum- 

car:r^a„P.herkindofack„ow,edgmcnt,w1hchw»hai= 

later  can  be  used.  For  Instance,  if  lo  is  an  output  vartable  used  together  with 
input  variable  to  implement  a  so-called  active  handshaking  protocol,  a pos¬ 
sible  acknowledgment  of  /»t  is  li.  since  li  -  lo  at  this  point  of  the  protoc 
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8.3  Implementation  of  Stability 


Consider  a  PR  set  P,  which  implements  a  given  program  S.  We  are  going  to 
show  that  the  acknowledgment  property,  which  is  necessary  to  construct 
that  implements  S.  is  also  sufficient  to  guarantee  stability. 

The  execution  of  a  PR  p  of  P  establishes  a  path  between  a  constant  node 

(either  VDD  or  GND),  and  the  node  implementing  the  vanable  -say^x  as 
signed  by  p.  Either  p.g  holds  forever  after  p,  or  the  firing  °f  mother  PR  , 
invalidating  PR  of  p,  will  establish  ,p.g.  thereby  cutting  the  path  from 

constant  node  to  x. 

Let  p  be  the  complementary  PR  of  p.  i.e..  the  PR  with  the  complementary 
assignment.  If  the  PR  set  contains  both  p  and  p,  then  it  also  contains  /  be 
cause  of  the  noninterference  requirement  between  complementary  PRs.  And 
we  have  the  order  of  execution: 


p<I<p. 

In  all  the  states  between  /  and  p,  the  original  path  to  x  is  cut.  In  that  case, 
J 1 ve ,o s  “to  «  that  the  asa.gnn.em  to  a  is  comp.e.ed  before  the  path  ,s 

cut.  Hence  the  following  requirement: 

Comoletion  requirement  Assignment  p.a  is  completed  when  a  PR  q 
is  completed  whose  guard  is  an  acknowledgment  of  p.a.  The  execution  or  e 

of  the  PR  set  must  satisfy 
p  <q  ■<!  ■ 

Since  this  requirement  is  already  implied  by  the  acknowledgment  property, 
the  construction  of  P  automatically  guarantees  stability. 


8.4  Self-Invalidating  PRs 

Definition  A  PR  P  is  self-invalidating  when  R(p.a)  =■  ^P-9- 
For  example,  -ix  •—  xT  is  self-invalidating. 

Self-invalidating  PRs  are  excluded  by  the  completion  requirement  since  i 

^For  instance,  the  circuit  consisting  of  an  inverter  with  its  output  connected 
to  its  input  is  excluded  by  the  completion  requirement  since  it  correspond 


26 


Chapter  1  Martin:  Programming  in  VLSI 


to  the  PR  set: 

-ix  ■—  xt 
x  xj. 

and  the  two  PRs  of  the  set  are  self-invalidating.  However,  the  PR  set 


-IX 

Y  t 

y  ► 

Xt 

X  »-> 

yl 

-i  y  h-* 

xi 

fulfills  the  completion  requirement,  although  it  is  the  same  circuit  as  previ¬ 
ously,  since  the  only  change  is  the  addition  of  the  wire  y  wx. 

We  eliminate  such  “disguised”  self-invalidating  PRs  by  adding  the  following 
requirement: 

Restoring  Acknowledgment  Requirement  There  is  at  least 
one  restoring  PR  r  satisfying  p<r<I ,  where  r  is  restoring  if  it  is  not  part  of 
a  wire  or  a  fork. 

With  this  extra  requirement,  all  forms  of  self-invalidating  PRs  are  elimi¬ 
nated. 

It  is  remarkable  that  the  acknowledgment  requirement,  which  is  necessary 
to  enforce  the  sequential  execution  of  a  PR  set,  is  also  sufficient  to  satisfy  sta¬ 
bility.  From  now  on,  we  can  manipulate  PRs  as  if  the  transitions  were  discrete. 
We  have,  however,  made  no  simplifying  assumption  on  the  physical  behavior 
of  the  system.  The  only  physical  requirement  so  far  is  that  of  monotonicity. 

Another  requirement  on  the  implementation  is  that  the  rings  of  opera¬ 
tors  that  constitute  a  circuit  keep  oscillating.  It  turns  out  that  eliminating 
self-invalidating  PRs  enforces  the  condition  that  a  ring  contain  at  least  three 
restoring  operators,  which  is  a  necessary  (and  in  practice  also  sufficient)  con¬ 
dition  for  the  ring  to  oscillate,  thanks  to  the  “gain”  property  of  restoring  gates. 
(See  [14]  for  an  explanation  of  gain.) 


Part  II:  The  Compilation  Method 


In  this  part,  we  describe  how  a  program  in  the  source  notation  is  trans¬ 
formed  into  a  semantically  equivalent  set  of  VLSI  operators.  Four  major  trans- 
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an  intermediate  program  representation,  between  communicating  processes 
and  PRs,  that  allows  for  important  algebraic  manipulations  of  the  program: 
reshuffling,  process  factorization,  and  process  quotient.  We  illustrate  the 
method  with  a  series  of  examples  that  covers  practically  all  cases. 


9  Process  Decomposition 


The  first  step  of  the  compilation,  called  process  decomposition,  consists  in 
replacing  one  process  with  several  processes  by  application  of  the  following 

rule: 

Decomposition  Rule  A  process  P  containing  an  arbitrary  program 
part  S  is  semantically  equivalent  to  two  processes,  PI  and  P2.  where  PI  is 
derived  from  P  by  replacing  S  with  a  communication  action,  C,  on  a  newly 
introduced  channel  (C,D)  between  PI  and  P2,  and  P2  is  the  process  *[[D  — ♦ 
S-.D]]. 

The  structure  of  P2  will  be  used  so  frequently  that  we  introduce  an  operator 
to  denote  it:  the  call  operator.  We  denote  it  by  (D/S),  and  we  say  that  D  calls 
(or  activates)  S. 

Observe  that  process  decomposition  does  not  introduce  concurrency.  Al¬ 
though  PI  and  P2  are  potentially  concurrent,  they  are  never  active  concur¬ 
rently;  P2  is  activated  from  PI,  much  as  a  procedure  or  a  coroutine  would 
be.  The  newly  created  subprocesses  may  share  variables,  but,  since  the  sub¬ 
processes  are  never  active  concurrently,  there  is  no  conflicting  access  to  the 
shared  variables.  The  subprocesses  may  also  share  channels;  this  will  require 
a  special  implementation  for  such  channels.  Decomposition  is  applied  for 
each  construct  of  the  language.  For  construct  S,  the  corresponding  process 
P2  can  be  simplified  as  follows: 

If  S  is  the  selection  [B,  -  SiD  Bz  -  S2],  P2  is  simplified  as 
*[[DaBi  -»Si;D 

QDaB2-S2;D  (8) 

11. 
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If  Sis  the  repetition  *[Bi  — ►SiO  B2  — ‘$2].  P  2  is  simplified  as 
»[[D  A  Bi  — ►  Si 

fl  D  a  B2  - *  ^2  (9) 

D  D  A  nBi  A  -tB2  — *  D 
11- 

The  assignment  x  :=  B,  where  B  is  an  arbitrary  boolean  expression,  is 
implemented  as  the  selection  [B  —  At  B  ~>B  —  xi],  which  gives  for  P2 

*[[DaB-x1;D 
DDatB  —  *J;D 
11. 


The  generalizations  to  the  cases  of  an  arbitrary  number  of  guarded  com¬ 
mands  in  selection  and  repetition  are  obvious.  All  assignments  to  the  same 
variable  are  also  grouped  in  the  same  process.  Process  decomposition  is  ap¬ 
plied  repeatedly  until  the  right-hand  side  of  each  guarded  command  is  a 
straight-line  program. 

Process  decomposition  makes  it  possible  to  reduce  a  process  with  an  ar¬ 
bitrary  control  structure  to  a  set  of  subprocesses  of  only  two  different  types: 
either  a  (finite  or  infinite)  sequence  of  communication  actions,  or  a  repetition 
of  type  (8)  or  (9). 


10  Handshaking  Expansion 


The  next  step  of  the  transformation,  the  handshaking  expansion,  replaces 
each  communication  action  in  a  program  with  its  implementation  in  terms 
of  elementary  actions,  and  each  channel  with  a  pair  of  wire  operators.  We 
shall  first  ignore  the  issue  of  message  transmission  and  implement  only  the 
synchronization  property  of  communication  primitives. 

Channel  (X,  V)  is  implemented  by  the  two  wires  ( xo  w  yi)  and  ( yo  w  xi).  If  X 
belongs  to  process  PI  and  Y  to  process  P2,  then  xo  and  xi  belong  to  PI,  and yo 
and  yi  to  P2.  Initially,  xo,  xi,  yo,  and  yi  —which  we  will  call  the  “handshaking 
variables  of  (X,  Y)n —  are  false.  Assume  that  the  program  has  been  proven  to 
be  deadlock-free  and  that  we  can  identify  a  pair  of  matching  actions  X  and  Y 
in  PI  and  P2,  respectively.  We  replace  X  and  Y  by  the  sequences  Ux  and  Uy, 
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respectively,  where 


ux 

s  xot;  [xi] 

(10) 

uy 

=  [yi];  yo\  ■ 

xo 

—  yit 

-i  xo 

—  yil 

(11) 

yo 

—  XI  t 

i  yo 

—  xil, 

by  definition  of  the  wires.  By  (10)  and  (11).  any  concurrent  execution  of  PI 
and  P2  contains  the  following  sequence  of  assignments: 

xot;  y t;  yoV.  • 

10.1  Simultaneous  Completion  of  Nonatomic  Actions 

We  introduce  a  definition  of  completion  of  a  nonatomic  action  which  makes 
it  possible  to  use  the  notion  of  simultaneous  completion  of  two  nonatomic 

actions.  ,,  .  .  „  _ 

By  definition,  the  execution  of  an  atomic  action  is  considered  instanta¬ 
neous,  and  thus  the  simultaneous  completion  of  two  atomic  actions  does  not 
make  sense.  (Atomic  actions  are  simple  assignments  x  T  and  x  l,  and  eval¬ 
uation  of  simple  guards,  i.e.,  guards  containing  one  variable.  A  wait  action 
of  the  form  [ai]  is  a  nonatomic  action  that  may  be  treated  as  the  repetition 

— skip].)  .  .  . 

A  nonatomic  action  is  initiated  when  its  first  atomic  action  is  executed.  A 

nonatomic  action  is  terminated  when  its  last  atomic  action  is  executed. 

For  nonatomic  actions,  the  notion  of  completion  does  not  coincide  with 
that  of  termination.  A  nonatomic  action  might  be  considered  completed  even 
if  it  has  not  terminated,  i.e.,  even  if  some  atomic  actions  that  are  part  of  the 
action  have  not  been  executed.  The  definition  of  suspension  is  derived  from 

that  of  completion. 

Definition  A  nonatomic  action  X  is  completed  when  it  is  initiated  and  is 
guaranteed  to  terminate,  i.e.,  when  all  possible  continuations  of  the  compu¬ 
tation  contain  the  complete  sequence  of  atomic  actions  of  X. 

The  preceding  definition  can  be  further  explained  as  follows:  Consider  a 
prefix  tl  of  an  arbitrary  trace  of  a  computation.  (A  trace  is  a  sequence  of 
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atomic  actions  corresponding  to  a  possible  execution  of  the  program.)  The 
completion  of  X  is  identified  with  the  point  in  the  computation  where  tl  has 
been  completed,  if  (1)  X  is  initiated  in  rl,  and  (2)  all  possible  sequences  t2, 
such  that  fl  extended  with  t2  is  a  valid  trace  of  the  computation,  contain 
the  remaining  atomic  actions  of  X.  Hence  the  completions  of  two  nonatom, c 
actions  coincide  if  their  completion  points  coincide. 

(Observe  that  there  may  be  several  points  in  a  trace  that  can  act  as  comple¬ 
tion  point,  which  makes  it  easier  to  align  the  two  completion  points  of  two 
overlapping  sequences  so  as  to  implement  the  bullet  operator.) 

Definition  Between  initiation  and  completion,  an  action  is  suspended. 

These  definitions  of  completion  and  suspension  are  valid  because  they 
satisfy  the  three  semantic  properties  of  completion  and  suspension  that  are 
used  in  correctness  arguments,  namely: 

1.  {cX  =  x}X{cX  =  x+  1}, 

2.  qX  =f  pre(X),  where  pre(X)  is  any  precondition  of  X  in  terms  of  the  program 
variables  and  auxiliary  program  variables, 

3.  If  X  is  completed,  eventually  X  is  terminated. 

These  definitions  will  be  used  to  implement  the  bullet  operator  and  the 
communication  primitives  as  defined  by  axioms  A\  and  A2.  Consider  the  in¬ 
terleaving  of  Ux  and  U,.  At  the  first  semicolon,  i.e.,  after  xo  t,  Ux  has  been 
initiated,  but  it  cannot  be  considered  completed  since  the  valid  continuation 
that  does  not  contain  Uy  does  not  contain  the  rest  of  Ux.  At  the  second  semi¬ 
colon  both  Ux  and  Uy  have  been  initiated,  and  thus  all  continuations  contain 
the  rest  of  the  interleaving  of  Ux  and  Uy.  Hence,  Ux  and  U,  are  guaranteed  to 
terminate  when  they  are  both  initiated,  i.e.,  they  fulfill  A1  and  A2. 


10.2  Four-Phase  Handshaking 

Unfortunately,  when  the  communication  implemented  by  Ux  and  Uy  termi¬ 
nates  all  handshaking  variables  are  true.  Hence,  we  cannot  implement  t  e 
next  communication  on  channel  (X,Y)  with  U„  and  Uy.  The  complementary 
implementation,  however,  can  be  used  for  the  next  matching  pair,  that  is: 

Dx  =  xoi;  bx'3 
Dy  s  hyfl;  yol  . 

The  solution  consisting  in  alternating  Ux  and  Dx  as  an  implementation  of 
X,  and  Ur  and  Dy  as  an  implementation  of  Y,  is  called  two-phase  handshaking. 
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or  two-cycle  signaling.  Since  it  is  in  most  cases  impossible  to  determine  syn¬ 
tactically  which  X-  or  Y-actions  follow  each  other  in  an  execution,  the  general 
two-phase  handshaking  implementations  require  testing  the  current  value  of 
the  variables  as  follows: 

xo  txo;  [xi  -  xo] 

[yi  yo]',  yo  :•*  iyo . 

In  general,  we  prefer  to  use  a  simpler  solution,  known  as  four-phase  hand¬ 
shaking,  or  four<ycle  signaling.  In  a  four-phase  handshaking  protocol,  X- 
actions  are  implemented  as  “IV.  D„"  and  Y-actions  as  Uy;Dy  .  Observe  that 
the  D-parts  in  X  and  Y  introduce  an  extra  communication  between  the  two 
processes  whose  only  purpose  is  to  reset  all  variables  to  false. 

Both  protocols  have  the  property  that  for  a  matching  pair  (X,  Y)  of  actions, 
the  implementation  is  not  symmetrical  in  X  and  Y.  One  action  is  called  active 
and  the  other  one  passive.  The  four-phase  implementation,  with  X  active  and 
Y  passive,  is 

X  s  xot;  [xi];  xol;  hxi)  {12> 

Y  e  [yi];  yol;  hy/];  yol  • 

(Later,  we  will  introduce  an  alternative  form  of  active  implementation,  called 
lazy-active.)  Although  four-phase  handshaking  contains  twice  as  many  ac¬ 
tions  as  two-phase  handshaking,  the  actions  involved  are  simpler  and  are 
more  amenable  to  the  algebraic  manipulations  we  shall  introduce  later.  When 
operator  delays  dominate  the  communication  costs,  which  is  the  case  for 
communication  inside  a  chip,  four-phase  handshaking  will,  in  general,  lead 
to  more  efficient  solutions.  When  transmission  delays  dominate  the  commu¬ 
nication  costs,  which  is  the  case  for  communication  between  chips,  two-phase 
handshaking  is  preferred. 

10.3  Probe 

A  simple  implementation  of  the  probe  X  is  xi,  with  X  implemented  as  passive. 
(Given  our  definition  of  suspension,  the  proof  that  this  implementation  of 
the  probe  fulfills  its  definition  is  straightforward.) 

A  probed  communication  action  X-» . .  .X  is  then  implemented  as 


xi-*  ...xo|;  hxi];  xoj  . 
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10.4  Choice  of  Active  versus  Passive  Implementation 

When  no  action  of  a  matching  pair  is  probed,  the  choice  of  which  action 
should  be  active  and  which  passive  is  arbitrary,  but  a  choice  has  to  be  made. 
The  choice  can  be  important  for  the  composition  of  identical  circuits.  A  sim¬ 
ple  rule  is  that,  for  a  given  channel  (X,  V),  all  actions  on  one  port  (called  the 
active  port )  are  active,  and  all  actions  on  the  other  port  (called  the  passive 
port)  are  passive.  If  X  is  used,  all  X-actions  are  passive—  with  the  obvious 
restriction  that  Y  cannot  be  used  in  the  same  program. 

We  shall  see,  however,  that  this  criterion  for  choosing  active  and  passive 
ports  may  conflict  with  another  criterion  related  to  the  implementation  of 
input  and  output  commands. 

10.5  Properties  of  the  Handshaking  Protocol 

For  a  matching  pair  (X,  Y)  of  actions  implemented  as  (12)  and  (13),  and  the 
wires  (xo  wyi)  and  ( yowxi ),  the  concurrent  execution  of  X  and  Y  causes  the 
sequence  of  assignments 

xot;  y/t;  yoV.  x/t;  xoi;  yiU  yol:  xil , 

called  the  handshaking  protocol.  The  following  properties  of  the  handshaking 
protocol  play  an  important  role  in  the  compilation  method. 

Property  1  For  xo  and  xi  used  as  in  the  active  protocol  of  (12),  xi  is  an 
acknowledgment  of  xot  and  -ix;  is  an  acknowledgment  of  xoj.  For  yo  and  yi 
used  as  in  the  passive  protocol  of  (13),  n yi  is  an  acknowledgment  of  yo]  and 
yi  is  an  acknowledgment  of  yo i. 

Property  2  In  (12)  and  (13),  Dx  andDy  are  used  only  to  reset  all  variables 
to  false.  Hence,  provided  that  the  cyclic  order  of  the  actions  of  (12)  and 
(13)  is  maintained,  the  sequences  Dx  and  Dy  can  be  inserted  at  any  place 
in  the  program  of  each  of  the  processes  without  invalidating  the  semantics 
of  the  communication  involved.  This  transformation,  called  reshuffling,  may 
introduce  a  deadlock. 

Property  3  The  wait-actions  of  (12)  and  (1 3)  are  stable.  Reshuffling  main¬ 
tains  the  stability. 

Reshuffling,  which  is  the  source  of  significant  optimizations,  will  be  used 
extensively.  It  is  therefore  important  to  know  when  Property  2  can  be  applied 
without  introducing  deadlock. 
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There  are  two  simple  cases  where  the  reshuffling  of  sequence  UX',DX',S* 
into  sequence  “l lx\S\Dx*  does  not  introduce  deadlock: 

S  contains  no  communication  action,  or 

X  is  an  internal  channel  introduced  by  process  decomposition. 


1 1  Production-Rule  Expansion 

Production-rule  expansion  is  the  transformation  from  a  handshaking  ex¬ 
pansion  to  a  set  of  PRs.  It  is  the  most  crucial  and  most  difficult  step  of  the  com¬ 
pilation  since  it  requires  the  enforcement  of  sequencing  by  semantic  means. 
It  consists  of  three  steps: 

1.  State  assignment, 

2.  Guard  strengthening, 

3.  Symmetrization. 

We  shall  explain  the  algorithms  for  production-rule  expansion  with  an  ex¬ 
ample:  the  implementation  of  the  simple  process  (L/R),  where  R  is  an  active 
channel.  This  process  is  one  of  the  basic  building  blocks  for  implementing 
sequencing.  The  handshaking  expansion  gives 

* [[//];  rot;  [«!;  ro{\  hr/];  /of;  [-*//];  foil- 

We  now  consider  the  handshaking  expansion  as  the  specification  of  the  im¬ 
plementation:  Any  implementation  of  the  program  has  to  satisfy  the  ordering 
defined  by  (14).  The  next  step  is  to  construct  a  production-rule  set  that  satis¬ 
fies  this  ordering.  We  start  with  the  production-rule  set  that  is  syntactically 
derived  from  (14): 

li  •  rot 
ri  •-*  rol 
iri  *-*  iof 
-i //  *-*  lol . 

(As  a  clue  to  the  reader,  PRs  of  a  set  are  listed  in  program  order.) 

Since  the  program  is  deadlock-free,  effective  execution  of  the  PRs  in  pro¬ 
gram  order  is  always  possible.  Some  other  execution  orders,  however,  may 
also  be  possible.  The  production-rule  set  satisfies  the  handshaking-expansion 
specification  if,  and  only  if,  the  only  possible  execution  order  is  the  program 
order.  If  execution  orders  other  than  the  program  order  are  possible  for  the 
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production-rule  set,  the  guards  of  some  rules  are  strengthened  so  as  to  elim¬ 
inate  these  execution  orders. 

In  our  example,  program  order  is  not  the  only  execution  order  for  the 
syntactic  production-rule  set:  Since  iri  holds  initially,  the  third  PR  can  be 
executed  first.  This  is  also  true  for  the  fourth  PR,  but  the  execution  of  the 
fourth  rule  in  the  initial  state  is  vacuous.  Because  all  handshaking  variables 
of  R  are  back  to  false  when  R  is  completed,  we  cannot  find  a  guard  for  the 
transition  lo\ that  holds  only  as  a  precondition  of  lo]  in  (14).  Hence,  we  cannot 
distinguish  the  state  following  R  from  the  state  preceding  R,  and  thus  the 
sequential  execution  condition  introduced  in  Section  8  cannot  be  satisfied. 

This  is  a  general  problem,  since  it  arises  for  each  unshuffled  communica¬ 
tion  action.  In  order  to  fulfill  the  sequential-execution  condition,  we  have  to 
guarantee  that  each  state  of  the  handshaking  expansion  is  unique,  i.e.,  that 
there  exists  a  predicate  in  terms  of  variables  of  the  program  that  holds  only 
in  this  state.  The  task  of  transforming  the  handshaking  expansion  so  as  to 
make  each  state  unique  is  called  state  assignment. 


11.1  State  Assignment  with  State  Variables 

The  first  technique  to  define  uniquely  the  state  in  which  the  transition  /ot  is 
to  take  place  consists  in  introducing  a  state  variable,  say  x,  initially  false. 
Handshaking  expansion  (14)  becomes 


*[[//];  rot  [rfl;  xt  [x];  rot  hr/];  lot  h/fl:  *1:  h*l:  toll- 


(15) 


Observe  that  (15)  is  semantically  equivalent  to  (14)  since  the  two  sequences 
of  actions  that  are  added  to  (14),  namely,  xt;[x]  and  xi;  [■*],  are  equivalent  to 
a  skip.  (The  newly  introduced  variable  x  is  used  nowhere  else.) 

There  are  several  places  where  the  two  assignments  to  the  state  variable 
can  be  introduced.  In  general,  a  good  heuristic  is  to  introduce  those  assign¬ 
ments  at  such  places  that  the  alternation  between  waits  and  assignments  is 
maintained.  There  are  other  heuristics,  however,  that  can  play  a  role  in  the 
placement  of  the  variables. 

Once  state  variables  have  been  introduced  so  as  to  distinguish  any  two 
states  of  the  handshaking  expansion,  it  is  possible  to  strengthen  the  guards 
of  the  PRs  to  enforce  program-order  execution.  The  basic  algorithm  for  guard 
strengthening  can  be  found  in  [10].  We  shall  not  describe  it  here.  Applied  to 
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(15),  it  gives 


-IX  a  li 

l— 

ro\ 

ri 

#— ► 

xT 

X 

t— *■ 

rol 

x  a  iri 

t— ► 

lo\ 

ili 

►— » 

xl 

-iX 

t— ► 

lol 

(16) 

(17) 

(18) 

(19) 

(20) 
(21) 


It  is  easy  to  check  that  the  acknowledgment  property  is  fulfilled  and  that  the 
only  possible  execution  order  for  the  preceding  production-rule  set  is  the 
program  order  defined  by  (15). 


12  Operator  Reduction 

The  last  step  of  the  compilation,  called  operator  reduction,  groups  together 
the  PRs  that  assign  the  same  variables.  Those  PRs  are  then  identified  with 
(and  implemented  as)  an  operator.  The  program  is  thus  identified  with  a  set 
of  operators. 

Since  we  have  enforced  the  stability  of  each  rule  and  noninterference  be¬ 
tween  any  two  complementary  rules,  we  can  implement  any  set  of  PRs  di¬ 
rectly.  (For  reasons  of  efficiency,  we  must  see  to  it  that  the  guards  do  not 
contain  too  many  variables  in  a  conjunct,  which  would  lead  to  too  many 
transistors  in  series.  Hence,  the  implementation  of  the  set  may  also  involve 
decomposing  a  PR  into  several  PRs  by  introducing  new  internal  variables.) 

The  direct  implementation  of  the  PR  set  (16)  through  (21)  is  straightfor- 

ward: 

(16)  and  (18)  correspond  to  the  asymmetric  C-element  nx,  u)  aC  ro. 

(19)  and  (21)  correspond  to  the  asymmetric  C-element  (x,  ->ri)  aCIo. 

(17)  and  (20)  correspond  to  the  flip-flop  (ri,  li)  ffx- 

If  the  preceding  operators  are  implemented  as  dynamic,  this  implemen¬ 
tation  of  process  (L/R)  is  the  simplest  possible.  If  static  implementations 
of  the  operators  are  required,  another  implementation  might  be  considered 
with  fewer  state-holding  elements  since,  as  we  have  explained  in  the  first 
part,  static  state-holding  operators  are  slightly  more  difficult  to  realize  than 
combinational  operators. 

A  last  transformation,  called  symmetrization,  may  be  performed  on  the  FK 
set  to  minimize  the  number  of  state-holding  operators.  Since  symmetrization 
also  introduces  inefficiencies  of  its  own,  however,  it  should  not  be  applied 
blindly. 
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13  Symmetrization 

Symmetrization  is  performed  on  the  two  guards  of  PRs  b  1  »-+  zt  and  b2  >-* 
z{,  when  one  of  the  two  guards,  say,  hi,  is  already  in  the  form  x  a  -ih2.  If 
we  replace  guard  b2  with  ix  v  b2.  then  the  two  guards  are  complements  of 
each  other,  i.e.,  the  operator  is  combinational.  Of  course,  weakening  guard 
b2  is  a  dangerous  transformation  since  it  may  introduce  a  new  state  where 
the  guard  holds.  We  have  to  check  that  this  does  not  occur  by  checking  the 
following  invariant: 

Given  the  new  rule  -ix  v  b2  •— » zi,  iz  must  hold  in  any  state  where  ix  a  ~>b2 
holds,  i.e.,  we  have  to  check  the  invariant  truth  of 

x  v  b2  v  iz . 

13.1  Operator  Reduction  of  the  (L/R)-element 

The  symmetrization  of  PRs  (16)  and  (18),  and  of  (19)  and  (21)  of  the  (L/R)- 
element,  gives 


-ix  A  li 

*—* 

rot 

(16) 

ri 

t— ► 

xt 

(17) 

-ili  v  x 

1— ► 

roi 

(18) 

x  a  -iri 

I— » 

lo] 

(19) 

-ili 

xl 

(20) 

ri  v  -ix 

»— ► 

lol . 

(21) 

(16)  and  (18)  correspond  to  the  end-operator  (ix,li)Aro. 

(17)  and  (20)  correspond  to  the  flip-flop  ( ri ,  li )  ffx. 

(19)  and  (21)  correspond  to  the  end-operator  ( x,-iri)Alo . 

(17)  and  (20)  can  also  be  implemented  as  the  C-element  (//,  ri)  Cx. 

The  resulting  circuit  is  shown  in  Figure  6.  (The  dot  identifies  the  input  that  is 
activated  first.)  This  implementation  of  ( L/R ),  either  with  a  flip-flop  or  with  a 
C-element,  is  called  a  Q-e/ement.  The  Q-element  implementing  (L/R)  as  before 
is  described  by  the  infix  notation  (li,  lo)Q(ri,  ro). 

14  Isochronic  Forks 

In  the  previous  operator  reduction,  li  is  an  input  to  the  flip-flop  (Ii,ri)ffx 
and  to  the  end-operator  (li,  -ix)  A  ro.  Formally,  in  order  to  compose  the  PRs 
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together  to  form  a  circuit,  we  have  to  introduce  the  fork  lifdl,  12)  and  replace 
//  by  /I  as  input  of  the  and-operator,  and  by  12  as  input  of  the  flip-flop.  We 
also  have  to  introduce  the  forks  ri[(rl,  r2)  and  xf(xl,x2)  for  the  same  reason. 
Let  us  analyze  the  effect  of  the  first  fork  only.  The  PR  set  that  includes  the 


PRs  of  the  fork  is 

li 

-•  '  11],  12] 

(16a) 

-IX  A  11 

rot 

(16b) 

ri 

•-+  x] 

(17) 

ill  vx 

f  rol 

(18) 

x  a  -t  ri 

lo] 

(19) 

-i/i 

~  /H/21 

(20a) 

i!2 

t—  xl 

(20b) 

ri  v  ix 

hi. 

(21) 

Now  we  observe  that  transition  II T  of  (1 6a)  is  acknowledged  by  the  guard  of 
(16b)  but  12]  is  not,  and  transition  12 i  of  (20a)  is  acknowledged  by  the  guard 
of  (20b)  but  /I  i  is  not.  Hence,  the  assignments  /2 1  and  /I  i  do  not  fulfill  the 
completion  requirement  and  thus  are  not  stable! 

We  solve  this  problem  by  making  a  simplifying  assumption:  We  assume 
that  the  fork  is  isochronic.  That  is,  the  difference  in  delays  between  the  two 
branches  of  the  fork  is  shorter  than  the  delays  in  the  operators  to  which  the 
fork  is  an  input.  Hence,  when  a  transition  on  one  output  is  acknowledged  and 


Figure  6.  Implementation  of  (L/R)  with  a  Q-element. 
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thus  completed,  the  transition  on  the  other  output  is  also  acknowledged  and 
thus  completed. 

This  is  the  only  timing  condition  that  must  be  fulfilled.  In  general,  the 
constraint  is  easy  to  meet  because  it  is  one-sided.  The  isochronicity  require¬ 
ment  is  more  difficult  to  meet,  however,  when  a  negated  input  introduces  an 
inverter  on  a  branch  of  the  fork,  since  the  transition  delays  of  an  inverter 
are  of  the  same  order  of  magnitude  as  the  transition  delays  of  other  opera¬ 
tors.  We  have  proved  that,  for  the  implementation  of  each  language  construct, 
these  inverters  can  always  be  eliminated  from  the  isochronic  forks  by  simple 
transformations.5  (See  [1,  2].) 

In  [11],  we  have  proved  that  the  class  of  entirely  delay-insensitive  circuits 
is  very  limited:  Practically  all  circuits  of  interest  fall  outside  the  class.  We 
believe  that  the  notion  of  isochronic  fork  is  the  weakest  compromise  to  delay- 
insensitivity  sufficient  to  implement  any  circuit  of  interest. 

Which  forks  have  to  be  isochronic  is  easy  to  decide  by  a  simple  analysis 
of  the  PR  sets.  For  instance,  the  fork  rif(rl,r2)  also  has  to  be  isochronic,  but 
the  fork  x[(xl,x2)  does  not.  We  shall  ignore  the  issue  of  isochronic  forks  in 
the  rest  of  this  presentation. 

15  Reshuffled  Implementations  of  (L/R) 

We  illustrate  the  use  of  reshuffling  by  deriving  two  other  implementations 
of  (L/R).  If  L  is  an  internal  channel  introduced  for  process  decomposition, 
we  can  reshuffle  the  handshaking  expansions  of  L  and  R  without  the  risk  of 
introducing  deadlock.  Let  us  return  to  handshaking  expansion  (14). 


15.1  First  Reshuffling 

We  postpone  the  second  half  of  the  handshaking  expansion  of  R  i.e.,  the 
sequence  rot;  [nr/] —  until  after  [i/i].  We  get 

*[[//];  rot;  [«■];  /ot;  Nfl;  rol:  hri];  lo[\ . 

The  syntactic  PR  expansion  we  now  derive  is  already  “program-ordered  . 

li  •—  rot 
ri  •—  /ot 
ili  •-*  rot 
-iri  *—  /oj  . 

3  These  transformations  have  not  been  applied  to  the  circuits  presented  here  as  examples,  but 
they  are  always  applied  before  the  circuits  are  actually  implemented. 
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The  first  and  third  rules  specify  the  wire  ( liwro );  the  second  and  fourth  rules 
specify  the  wire  ( riwlo ).  Hence,  the  implementation  reduces  to  two  wires! 

15.2  Second  Reshuffling:  The  D-element 

We  now  postpone  the  whole  handshaking  expansion  of  R  until  after  h//J-  We 
get 

*[[/!'];  /ot;  [-*//];  rot;  lr/1;  rol;  hr/];  hi]. 

We  need  to  introduce  a  state  variable,  say  x,  as  follows: 

*][//];  xt;  [x];  /ot;  h //];  rot:  [ri];  xl;  [■«];  roi;  hr/];  hi] . 

The  PR  expansion  gives 

//  •-*  x] 

(r/v)x  >—  lo t 
x  a  -i //  »-»  rot 
ri  >-*  x| 

(//v)-ix  >-»  rol 
tx  a  nr/  •-*  /o]  . 

The  terms  between  parentheses  have  been  added  for  symmetrization.  The 
operator  reduction  gives 

(//,-iri)  ff  x 
(ri,  x)  v  /o 
(x,  -»//)  A  ro  . 

The  flip-flop  can  be  replaced  with  the  C-element  (//,  ~>ri)  Cx.  The  circuit,  shown 
in  Figure  7,  is  called  a  D-eiement. 

16  Sequencing 

There  are  many  ways  to  implement  the  sequencing  of  n  arbitrary  actions. 
We  shall  introduce  the  basic  operators  that  are  used  in  the  most  straightfor¬ 
ward  implementations. 
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16.1  The  Active-Active  Buffer 

Consider  the  program  *[Si;S2],  where  Si  and  S2  are  two  arbitrary  program 
parts.  Process  decomposition  of  this  program  gives 

*[L;R]  II  a'/S,)  ||  (R'/S2) . 

Hence  the  basic  sequencing  operator  is  the  process 
B(La,Ra)  *  *[L;R], 

where  both  L  and  R  are  active.  This  process  is  called  an  active-active  buffer. 
The  handshaking  expansion  gives 

*[/ot;  [//];  M;  E-1/1] ;  rot;  (r/];  rol;  hrr]] .  (22) 

Since  ri  is  false  initially,  we  can  rewrite  (22)  as 

*lbr/];  ioV.  [If]]  /ol;  [-.//];  rot;  [ri];  rol].  (23) 

By  comparing  (23)  with  (14)  —the  handshaking  expansion  of  the  Q-element— 
we  observe  that  B(IflJRfl)  =  (iri,  ro) Q(li,Io) ,  which  gives  the  implementation 
of  Figure  8. 

16.2  The  (L/A;R)-element 

In  order  to  generalize  the  preceding  construction  to  the  case  of  an  arbi¬ 
trary  number  of  actions,  we  must  implement  the  generalization  of  the  ( L/R )- 


Figure  7.  The  D-element. 
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element.  Sequence 

*[Sr.S2;...;Sn]  (24) 

can  be  decomposed  into  a  number  of  shorter  sequences  by  repeatedly  apply¬ 
ing  process  decomposition.  There  are  as  many  ways  to  decompose  (24)  as 
there  are  binary  trees  of  n  leaves.  But  observe  that,  if  n  >  2,  all  decomposi¬ 
tions  will  require  at  least  one  process  of  the  form 

(L/A-R), 

where  A  and  R  are  active  communication  actions.  (The  semicolon  binds  more 
tightly  than  the  process  call.)  We  shall  use  two  different  reshufflings  to  im¬ 
plement  this  process.  Again,  these  reshufflings  maintain  the  semantics  of  the 
original  program  if  the  handshaking  expansion  of  L'  is  not  reshuffled.  The 
first  reshuffling  is 

*f[/i'3;  ao];  [ai]\  loV,  h //];  ("««];  R •  M]  - 

We  decompose  it  into  two  sequences  by  applying  a  process-factorization  de¬ 
composition  described  in  [10]: 

(*[[//];  ao];  [i/i];  aol] 

II  *  [[ai];  loV,  haf];  R;  foil 

)• 

The  first  sequence  is  the  wire  (//  wao).  The  second  sequence  is  the  D-element 
( ai,lo)D  (ri.ro ). 


Figure  8.  Implementation  of  the  active-active  buffer  with  a  Q-element. 
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The  second  reshuffling  is 

*[[//>;  A;  rot  [ri];  lot  blit  rot  hr/];  foil  ■ 

Again,  we  decompose  it  into  two  sequences  by  process  factorization: 

(*[[rfl;  lot  hr/];  foil 
||  *  [[//];  A;  rot  Nfl;  ro\] 

). 

The  first  sequence  is  the  wire  ( riwlo ),  The  second  sequence  is  the  Q-element 
(Ii,ro)  Q(ai,ao).  Both  implementations  are  shown  in  Figure  9. 

Now  the  implementation  of  a  sequence  of  n  actions  is  straightforward.  For 
instance,  for  n  -  4,  we  have  two  “linear"  decompositions  of  (Jl/Si ;  S2; S3; S4). 
The  first  one  is 

(O/Sr.Li)  II  (LWSZ;LZ)  II  (L2/53;S4)) . 

The  second  one  is 

«L/I2;S4)  II  (Lz/LuSi)  ||  (Li/Si;Sz)) . 

These  two  decompositions  lead  to  the  linear  implementations  shown  in  Fig¬ 
ure  10. 

16.3  The  Passive-Active  Buffer 

In  order  to  compose  one-place  buffers  in  a  linear  chain,  one  channel  must  be 
active  and  the  other  one  passive.  We  implement  the  buffer  with!  passive  and 
R  active.  This  version  is  denoted  by  B(Lp,Ra).  In  order  to  take  advantage  of 


Figure  9.  Implementations  of  the  (L/A;K)-element. 
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the  active-active  case,  we  decompose  the  buffer  into  two  processes  q  and  f: 

q  m  »[D'-,R] 
t  =  (D/L). 

Process  q  is  an  active-active  buffer.  The  compilation  of  t  is  straightforward. 
The  handshaking  expansion  gives 

*[[dfl;  [//];  tot;  [-»//];  toi;  dot;  hd/];  do\\ . 

Since  D  is  an  internal  channel,  we  can  reshuffle  the  sequence  [-»//];  toi  with 
respect  to  D  without  introducing  deadlock.  (Also  observe  that  since  do  l  re¬ 
mains  the  last  action  of  the  sequence,  we  have  not  changed  the  order  of  L 
relative  to  R.)  We  get 

*[[d/J;  [//];  tot;  dot;  h d/J;  h/i'l;  toi;  dol]. 

The  PR  expansion  leading  to  the  circuit  of  Figure  6  is 

di a  //  *-*  tot,dot 
-idiA-i/i  toi  dol  . 

Process  t  is  used  to  connect  the  two  ports  of  a  channel  when  they  are  both 
active.  It  is  called  a  "passive-passive  adaptor".  The  complete  circuit  is  shown 
in  Figure  11. 

The  passive-active  buffer  can  be  compiled  directly  by  introducing  a  state 
variable.  The  circuit  obtained  is  slightly  different.  See  [8]. 


Figure  10.  Implementations  of  (L/Si'.Sz.Si.S*). 
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1 7  Single-Variable  Register 

Consider  the  following  register  process,  which  provides  read  and  write 
access  to  a  simple  boolean  variable,  x: 

*[[?—  P?x 

C5) 

n. 


where  ->P  v  nQ.  holds  at  any  time. 

The  handshaking  expansion  of  (25)  uses  the  double-rail  technique.  The 
boolean  value  of  x  is  encoded  on  two  wires,  one  for  the  value  true  and  one 
for  the  value  false.  Input  channel  P  has  two  input  wires,  pi  1  for  receiving 
the  value  true  and  p/2  for  receiving  the  value  false,  and  one  output  wire, 
po.  Output  channel  Q  has  two  output  wires,  qo  1  for  sending  the  value  true 
and  qo2  for  sending  the  value  false,  and  one  input  wire,  qi.  Each  guarded 
command  of  (25)  is  expanded  to  two  guarded  commands: 


*[[pil  —  xt;  [x];  pot;  hp/1];  pol 
fl  p/2  —  xi;  t-ix];  pot;  hp/2);  pot 
[lx A*?/  —  <?oit;  hq/];  qo H 
o  1XAI?/-  qo2\\  [-u?/l:  qo 2J. 

]]■ 


Figure  11.  An  implementation  of  the  passive-active  buffer. 
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17.1  Mutual  Exclusion  between  Guarded  Commands 

We  are  now  faced  with  a  new  problem:  enforcing  mutual  exclusion  between 
the  production-rule  sets  of  different  guarded  commands.  {This  problem  is 
not  concerned  with  making  the  guards  of  the  different  commands  mutually 
exclusive.  For  the  time  being,  we  are  considering  only  examples  where  the 
guards  of  the  commands  are  already  mutually  exclusive.)  Let  us  illustrate  our 
problem  with  the  compilation  of  the  first  two  guarded  commands.  If  we  just 
concatenate  the  production-rule  sets  of  these  two  commands,  we  get 


pi  1  •—  x| 
pi  1  ax  •-*  pot 
ipil  >—  pot 
pi2  •->  xl 
pi2  a  -ix  <—  po\ 

-ipi2  *-*  po  1  . 

We  now  observe,  however,  that  the  second  and  the  sixth  guarded  commands 
are  interfering  (they  set  and  reset  the  same  variable  po),  and  that,  for  reasons 
of  symmetry,  the  same  holds  for  the  third  and  the  fifth  PRs. 

Hence,  the  problem  of  ensuring  mutual  exclusion  between  PRs  of  different 
guarded  commands  is  the  same  as  enforcing  program  order  between  PRs  of 
the  same  guarded  command.  We  use  the  same  technique,  which  consists  in 
strengthening  the  guards  of  the  production  rules,  if  necessary,  by  introducing 
state  variables  to  distinguish  between  the  states  corresponding  to  each  true 
guard. 

In  the  case  at  hand,  we  strengthen  the  guards  of  the  third  and  the  sixth 
rules  as 

XA-ipil  •-*  po  l 
tx  a  ~<pi2  *-*  pot  . 

The  rest  of  the  implementation  is  straightforward.  The  first  and  fourth  PRs 
correspond  to  the  flip-flop  (p/l,-ip/2)  ffx.  The  other  PRs  can  be  transformed 
into 


{pi  1  A  x)  V  (p/2  A  -ix)  I—  pot 
(-ipil  a  tx)  v  (np/2  ax)  i-*  pot. 


which  is  the  definition  of  the  if-operator  {pi  1,  pi 2,  x)  ifpo . 
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The  production-rule  expansion  of  the  last  two  guarded  commands  of  (26) 
gives 

*a  qi  ■  qo  1| 

-vtv-iqi  qoll 
-ix  a  qi  i-»  qo2\ 
x  v  ->qi  •—  qo2\, 

which  corresponds  to  the  two  operators  (x,qi)Aqol  and  (ix,qi)  A  qo2.  The 
circuit  is  represented  in  Figure  12. 

In  the  next  example,  we  shall  refer  to  the  implementation  of  the  first  two 
guarded  commands  of  (26)  as  the  register  operator: 

(pi  1,  pi2)  reg  (po,x) . 

We  shall  refer  to  the  implementation  of  the  last  two  guarded  commands  of 
(26)  as  the  read  operator: 

(qi.x)  read (ao\.  qo2) . 

18  Implementation  of  the  Stack 

The  implementation  of  the  stack  will  be  used  to  explain  the  general  method 
for  implementing  communications  that  involve  passing  messages.  The  method 


Figure  12.  Single  boolean  register. 
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relies  on  the  time-honored  “divide-and-conquer'  principle:  We  first  construct 
the  so-called  control  part  of  the  program,  which  is  the  original  program  where 
the  messages  have  been  removed  from  each  communication  action.  We  then 
combine  this  control  part  with  a  data  path,  which  is  a  program  implementing 
the  assignment  parts  of  the  communication  actions.  (See  Figure  16  in  Section 
20.)  The  basic  technique  for  combining  control  and  data  was  introduced  in 

[91- 

18.1  The  Control  Part  of  the  Stack 

The  control  part  of  the  stack  consists  of  programs  £  and  F,  from  which  mes¬ 
sage  communication  has  been  removed.  We  assume  that  the  stack  is  empty 
initially.  We  introduce  the  channel  (t,  t ')  so  that  F  can  be  called  from  within  £ 
by  process  decomposition.  We  get 

£  a  *[[7«  —  in;  t 

D  out ->  get;  out 

)1 

F  -  *[[Fa  In ->  put;  in 

a  out-*  out;  t' 

]}• 

In  the  handshaking  expansion,  we  let  the  choice  of  active  and  passive  com¬ 
munications  be  dictated  by  the  occurrence  of  the  probes.  (We  will,  however, 
return  to  this  choice  later.)  We  get 

£  =  *[[-it/  a  ini  ->  inol;  hinfl;  mot;  tot ;  [ffl;  tol 

0  -iti  Aouti  ->  geto];  [ geti ];  getol;  [ igeti ];  outot;  h outi];  outol 

]] 

F  3=  *[[ti' a  ini putoV,  [put/];  putol;  [~*puti];  inoj;  hmi];  inol 

0  ti'  a  outi  —  outol;  hout/];  outoi;  to' t;  ht/'];  to'i 

]]. 

Observe  that,  after  handshaking  expansion,  the  symmetry  between  £  and  F 
has  been  restored.  The  choice  of  whether  ti  or  ti'  should  be  negated  in  the 
guards  determines  whether  £  or  F  should  be  called  initially,  i.e.,  whether  we 
start  with  an  empty  or  a  full  stack  element. 


48 


Chapter  1  Martin:  Programming  in  VLSI 


18.2  Compilation  of  E 

The  first  guarded  command,  El.  is  a  standard  passive-active  buffer.  The  sec¬ 
ond  guarded  command,  E2,  is  a  standard  Q-element.  The  implementation  of 
£  must  combine  the  implementations  of  £1  and  £2  in  a  way  that  enforces 
mutual  exclusion  between  the  execution  of  £1  and  that  of  £2. 

Since  the  execution  of  in  and  that  of  out  are  mutually  exclusive,  it  suf¬ 
fices  to  guarantee  that  when  in  is  completed  in  £1,  £2  cannot  start  until  t 
is  completed.  We  introduce  the  variable  z  (initially  true)  in  the  handshaking 
expansion  of  £1,  as  indicated  in  Figure  13,  and  we  strengthen  the  guard  of 
£2  with  z.  We  get 


£1  m  z  a  ini  —  /not;  z\\  [iz];  him];  inoU  tot;  It/];  tol;  hr/];  zt , 

£2  ■  ~iti  a  outi  a  z  — ♦  geto]\  [geti ];  getol;  [igetiY,  outo t;  hour/];  outol  . 


Now  £2  cannot  start  until  zt  is  completed,  i.e.,  until  £1  is  completed.  Since, 
by  the  structure  of  £1 ,  z  =>  it/,  we  can  simplify  the  guard  of  £2  to  outi  a  z.  For 
symmetrization,  we  also  weaken  iouti  as  iouti  v-iz.  Hence,  mutual  exclusion 
is  enforced  by  replacing  input  outi  with  the  and-operator  (outi,  z)  A  outi'  in 
the  Q-element  implementation  of  £2.  This  gives  the  circuit  of  Figure  14  as  an 
implementation  of  £. 


Figure  13.  Implementation  of  the  first  g.c.  of  £  with  variable  z. 
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18.3  Compilation  of  F 

The  compilation  of  FI  is  identical  to  that  of  £2  with  the  appropriate  change  of 
variables.  The  compilation  of  F2,  however,  can  be  simplified  by  reshuffling. 
Since  channel  (t,  t')  is  internal,  we  can  reshuffle  the  handshaking  sequence  of 
r'  without  deadlock.  The  handshaking  expansion  of  F2  becomes 

ti'  Aouti— ‘OUtoT;  to't;  hti'  a  iout/];  outo i;  to' l, 

which  compiles  immediately  into  the  “forked"  C-element  ( ti',outi)C(outo ,  to'). 
The  reshuffling  guarantees  that  FI  cannot  be  started  before  F2  is  completed. 

The  channels  in  and  out  are  used  in  both  £  and  F,  so  we  must  merge  the 
local  copies  of  in  and  the  local  copies  of  out  in  a  standard  way  that  we  do  not 
describe  here.  The  resulting  circuit  for  the  control  part  of  the  stack  element 
is  shown  in  Figure  15. 


Figure  14.  Implementation  of  £. 
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19  Implementation  of  the  Data  Path 

We  now  have  to  extend  the  implementation  of  the  control  part  S2  so  as 
to  obtain  an  implementation  of  the  whole  program  SI.  We  want  to  leave  S2 
unchanged  by  introducing  a  datapath  process,  P,  such  that  the  parallel  com¬ 
position  of  S2  and  P  implements  SI. 

The  channels  in,  out, get,  put  of  S2  are  renamed  in',  out', get',  put'.  P  com- 


Flgure  IS.  The  control  part  of  the  stack  element. 
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municates  with  S2  via  in',  out1,  get',  put'  and  with  the  environment  via  in,  out, 
get,  put.  (See  Figure  16.) 

Let  C  be  a  channel  of  SI,  and  C'  be  the  renamed  channel  of  S2  to  which  C 
corresponds.  For  (S2  ||  P)  to  implement  SI,  each  communication  on  C  must 
coincide  with  a  communication  on  C';  i.e.,  P  must  implement  the  so-called 
channel  interface  process 

/cs  *[C.C']. 

Hence,  P  has  to  implement  the  four  channel  interfaces: 

*  [in'  •  in?x ] 

*  [out  •  out  !x] 

*  [get  •  get7x] 

*  [put  •  put\x] . 

20  Implementation  of  Channel  Interfaces 

There  are  four  types  of  channel  interfaces,  depending  on  whether  the  port 
is  active  or  passive,  and  whether  the  communication  is  an  input  or  an  output. 


Figure  16.  Adding  the  data  path. 


put 

get 


52 


Chapter  1  Martin:  Programming  in  VLSI 


20.1  Input  Actions  on  a  Passive  Port 

We  want  to  implement  the  interface  Ic  for  action  C?x  on  the  passive  port  C.  Ic 
communicates  with  S2  by  the  active  port  C,  and  with  the  environment  by  the 
passive  port  D.  Furthermore,  in  the  standard  double-rail  encoding  technique, 
the  two-wire  implementation  ( ci.co )  of  C  has  to  be  interfaced  to  the  three-wire 
input  port  D  in  which  the  two  input  wires,  di  1  and  d/2,  are  used  to  encode 
the  two  values  of  the  incoming  message.  (See  Figure  17.) 

Ic  has  to  implement  an  interleaving  of  the  following  three  sequences: 

Sc  s  *[c/'T;  [co']-,  c/'l;  hco']] 

SD  m  *[[d/l  vd/2];  do];  [nd/1  A-id/2];  doi] 

Sx  =  *[[d/l-xf;  [x]  D  d/2  — xl;  hxj]]. 

An  implementation  of  C'  •  D  interleaves  sequences  Sc  and  So  as 

*  [[d/1  v  d/2);  c/'t;  [co'];  dot;  hdil  a  nd/2];  c/'l;  hco'];  doi].  (28) 

In  the  interleaving  of  (28)  and  Sx,  the  assignment  to  x  is  inserted  after  [co'] 
so  as  to  ensure  that  communication  action  C  has  been  started  when  the  as¬ 
signment  to  x  is  performed: 

*[[d/l  v  d/2];  c/'t;  [co'  a  d/1  -  xt;  [x]D  co'  a  d/2  -  xl;  hx]];  (2g) 

dot;  [nd/1  a  nd/2];  c/'l;  hco'];  doi]. 


Figure  17.  Channel  interface  for  input  port. 
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Next,  we  factor  (29)  as 

*  [[d/1  v  d/21;  ci'V,  hd/1  a  nd/2];  c/'i] 

and 

*[[co'  a  dil  —  xt;  [x];  do];  h co'];  do \ 

B  co'  a  di2  —  xl;  [-ix];  do]:  hco'];  do 1 
]]• 

Sequence  (30)  is  realized  by  the  operator  (d/1,  d/2)  vc/'.  We  factor  (31)  so  as 
to  isolate  the  register  part: 

{co',  dil)  aCxl  =  *[[co'  a  d/1];  xlf,  hco'];  xli] 

{co',di2)g£x2  ■  *[[co' Ad/2];  x2t;  hco'];  x2i] 

(xl,x2) reg(x,do)  m  *[[xl  — xt;  M;  dot;  hxl];  do] 

0x2  — xj;  hx];  dot;  [nx2];  dot 
!]• 

The  implementation  is  shown  in  Figure  18. 

20.2  Input  Actions  on  an  Active  Port 

For  port  C  active,  the  communication  variables  of  the  interface  Ic  remain  the 
same.  But  now  the  handshaking  expansions  of  C‘  and  D  are  different,  since 
C'  is  passive  and  D  is  active.  We  get 

Sc  =  *[[co'];  cf't;  hco'];  cf'l] 

SD  ss  *[dot;  [d/1  v  d/2];  do  1;  [-id/1  a  -id/2]] 

Sx  e  *  [[d/1  —  xt;  MO  d/2  — xl;  hx]]]. 

(Observe  that  Sx  is  not  changed.)  An  interleaving  of  Sc  and  SD  that  implements 
C'  •  D  is  the  interleaving  corresponding  to  two  wires: 

*  [[co'];  dot;  [d/1  v  d/2];  c/'t;  hco'];  dot;  [-id/lA-id/2];  c/'i]. 

As  to  the  implementation  of  the  assignment  to  x,  we  now  observe  that,  since 
C  and  D  are  active,  there  is  no  risk  of  the  assignment  to  x  being  started  before 


(30) 


(31) 
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C  is.  The  interleaving  obtained  is 

*[[co'];  do t;  — ►  xT  fl  d/2  —  xl]; 

c/'t;  hco'];  dol:  [idil  a  idi2];  ci'l], 
which  can  be  factored  into  the  wire 

(co'  wdo)  *  *[[co'];  dot,  hco'];  dol] 
and  the  register 

(dil,di2)reg(x,ci')  -  *[[d/l  -xt;  M;  c/'t;  [ndi\]\  ci'l 

D  di2  —  xi;  [nx];  c/'t;  [~w//2];  ci'l 
]]■ 

The  implementation  of  the  interface  is  shown  in  Figure  19. 

20.3  Output  Actions 

In  the  case  of  an  output,  like  out  lx  or  putlx,  the  implementation  turns  out  to 
be  the  same  for  passive  and  active  ports.  Given  the  same  nomenclature  as  in 

Figure  18.  Input  actions  on  passive  port. 
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the  input  case,  port  D  is  now  implemented  with  two  output  variables,  do  1  and 
do2,  and  one  input  variable,  di.  Port  C‘  is  not  changed.  The  rest  of  the  deriva¬ 
tion  is  straightforward  and  is  left  as  an  exercise  for  the  reader.  It  leads  to  a 
wire  and  a  read  operator,  which  we  have  introduced  in  the  implementation 
of  the  register: 


diwcin  m  *[[<//];  c/'t;  hdi];  ri'13 


(co',x)read{do\,do2)  ■  *[[x a co'  —  dolt;  h «>'];  doll 

J  -ix  A  co'  — *  do2V,  hco'];  do 21 
]]■ 


The  only  difference  between  the  active  and  the  passive  cases  is  that,  in  the 
active  case,  the  read  is  activated  first.  In  the  passive  case,  the  wire  is  activated 
first.  The  circuit  is  shown  in  Figure  20. 

20.4  Active  Input  and  Passive  Output 

A  somewhat  surprising  result  of  this  implementation  of  input  and  output 
commands  is  that,  contrary  to  common  belief,  it  is  simpler  to  implement  in¬ 
put  commands  with  active  ports  than  with  passive  ports.  The  gain  is  quite 


Figure  19.  Input  actions  on  active  port. 
do  i  co' 
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important:  For  n  bits  of  data,  the  active  implementation  saves  2  xn  asym¬ 
metric  C-elements  and  n  or-gates.  On  the  other  hand,  the  implementation  of 
output  actions  is  the  same  for  active  and  passive  ports. 

Therefore,  we  shall  always  implement  input  actions  with  active  ports.  When 
the  input  port  is  probed,  like  in  in  the  stack  example,  we  shall  use  a  slightly 
more  complicated  implementation  of  the  handshaking  protocol  that  makes 
it  possible  to  probe  an  active  port. 

20.5  Lazy-Active  Protocol 

Consider  the  active  implementation  of  communication  command  X: 
xoT;[x/];xol;hx/]. 

We  introduce  an  alternative  active  protocol,  called  lazy-active: 

[-ixi];xoT;  [x/];xo|  . 

The  lazy-active  protocol  is  derived  from  the  active  one  by  postponing  wait 
action  [tx/J  until  the  beginning  of  the  next  communication  onX,  and  by  adding 
a  vacuous  wait  action  [ixi]  at  the  beginning  of  the  first  communication  X. 
Hence,  the  lazy-active  protocol  is  a  correct  implementation. 

Consider  sequence  X;S,  where  S  is  an  arbitrary  program  part.  With  X  lazy- 
active,  half  of  the  communication  delays  overlap  with  the  execution  of  S.  The 


Figure  20.  Output-action  interface. 
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gain  is  particularly  important  when  data  communication  is  involved,  since 
half  of  the  data-transmission  delays  and  half  of  the  "completion-tree”  delays 
can  overlap  with  the  rest  of  the  computation. 

This  important  property  of  lazy-active  protocols  was  discovered  recently 
by  Steve  Burns.  All  input  actions  are  now  implemented  as  lazy-active.  We  have 
not  done  so  in  the  stack,  which  is  an  older  design. 


2 1  The  Complete  Circuit  for  the  Stack 

The  sharing  of  register  x  by  ports  in  and  get  has  to  be  implemented  ei¬ 
ther  by  a  multiplexer  or  by  a  multiport  flip-flop.  Since  only  two  ports  share 
the  register,  we  choose  to  use  a  dual-port  flip-flop.  The  complete  datapath  is 
shown  in  Figure  21. 

The  complete  circuit  obtained  by  composing  the  different  parts  together 
is  shown  in  Figure  22.  An  important  optimization  has  been  added  to  the 


Figure  21.  The  complete  datapath. 
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Figure  22.  The  complete  circuit  for  a  one-bit  stack  element. 
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design.  It  concerns  the  implementation  of  the  second  guard  of  E: 
out~>get?x;  outlx. 

We  observe  that  the  value  of  x  involved  in  the  second  action  (outlx)  is  the 
same  as  the  value  of  x  involved  in  the  first  action  (get?*).  We  can  therefore 
encode  the  transmitted  value  in  the  handshaking  expansion  of  the  guarded 
command  without  having  to  use  register  x.  We  are  tempted  to  make  this  opti¬ 
mization  available  to  the  programmer  by  allowing  assignments  to  ports.  We 
would  then  write 

oui—>out\get. 

The  preceding  modification  leads  to  a  significant  simplification  of  the  circuit 
since  we  can  eliminate  a  D-element,  and,  for  each  bit  of  the  data  path,  we 
can  eliminate  an  IF-element  and  replace  the  multiport  flip-flop  with  a  simple 
flip-flop.  The  chip  we  have  fabricated  includes  this  modification,  as  well  as 
the  optimization  that  consists  in  making  input  port  in  active. 


22  A  Delay-Insensitive  Fair  Arbiter 

This  last  example  addresses  the  issues  of  arbitration  between  guards  and 
unstable  guards.  We  have  already  discussed  the  metastability  property  of  ar¬ 
biters.  The  realization  of  a  delay-insensitive  arbiter,  however,  raises  another 
issue:  fairness.  An  arbiter  is  strongly  fair  when  a  pending  communication  re¬ 
quest  is  granted  after  a  bounded  number  of  other  requests  are  granted.  An 
arbiter  is  weakly  fair  when  a  request  is  granted  after  a  finite  but  possibly 
unbounded  number  of  other  requests.  Whether  it  is  possible  to  construct  a 
delay-insensitive  fair  arbiter  has  been,  so  far,  an  open  question.  It  has  been 
conjectured  that  delay-insensitive  fair  arbiters  do  not  exist.  In  this  example, 
we  prove  the  existence  of  delay-insensitive  fair  arbiters  by  constructing  one. 

22.1  A  Fair-Arbiter  Program 

The  process  fsel  described  in  the  first  part  defines  a  fair  arbitration  program 
between  two  unrelated  inputs.  We  choose  to  implement  the  following  simpli¬ 
fied  version  of  fsel: 

*  {[A  -*  AD  —  skip];  [B  —  BD  iB  —  skip]] .  (33) 

According  to  (33),  when  A  holds,  A  will  be  completed  after  at  most  one  B  ac¬ 
tion,  regardless  of  the  current  state  of  the  computation.  Hence,  the  arbiter  is 
strongly  fair  towards  requests  A  and  B.  Assume  that  A1  is  pending  at  a  certain 
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point  of  the  computation.  By  definition  of  the  probe,  A  is  true  eventually; 
i.e.,  a  finite  but  unbounded  number  of  B  actions  can  be  completed  between 
the  moment  q A'  holds  and  the  moment  A  holds.  Hence,  the  arbiter  is  only 
weakly  fair  towards  requests  A'  and  B'. 

Therefore,  with  this  definition  of  suspension  of  an  action,  we  can  say  that 
the  arbiter  is  strongly  fair  towards  requests  that  have  reached  the  arbiter 
and  weakly  fair  towards  all  requests.  (We  could  redefine  the  suspension  of  a 
communication  action  X  such  that  qX  holds  only  when  the  initiation  of  action 
X  can  be  observed  by  the  other  process.  With  this  definition  of  suspension, 
we  have  qA'  =  A.  The  arbiter  is  then  strongly  fair  towards  all  requests.) 


22.2  The  Compilation 

Applying  the  process  decomposition  rule,  we  decompose  (33)  into  three  pro¬ 
cesses  (PI  II  P2  II  P3).  Channels  ( C.D )  between  PI  and  P2,  and  (£,F)  between 
PI  and  P3  are  introduced: 


PI  = 

*[E\C] 

P2  ■ 

»[[DaB- 

0  D  a  iB  ■ 

}] 

P  3  ■ 

*[[FaA  — 

D  Fa  -iA- 

13. 

Ports  D  and  F  are  implemented  as  passive;  ports  C  and  E  are  implemented 
as  active.  Hence  PI  is  the  standard  active-active  buffer.  The  handshaking  ex¬ 
pansion  of  P2  gives 


P2  =  *[[di  a  bi— *  bo];  hi?/];  bo i;  do T;  bd/];  do\ 
5  di  A  nbi  — ►  do]',  [td/];  do[ 

]]. 


Because  bi  can  change  from  false  to  true  asynchronously,  the  second  guard 
of  P2  is  not  stable;  i.e.,  its  value  can  change  from  true  to  false  at  any  time. 
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In  order  to  make  both  guards  of  P2  stable,  we  introduce  the  synchronizer 

sync  s  *[[di  a  bi  — ►  iff;  (-idi];  u] 

D  di  a  ibi  vt;  [id/];  vi 
]]• 

Sync  is  the  standard  operator  we  have  described  in  Fart  I.  We  now  have  to  find 
a  process,  X.  such  that  (X[|sync)  -  P2 .  Since  sync  is  entirely  defined,  we  would 
like  to  be  able  to  perform  the  inverse  operation  of  ||,  or  “process  quotient”, 
so  as  to  compute  X  as  X  -  (P2  -r  sync)  .  A  way  to  perform  this  quotient  is 
to  remove  all  actions  of  sync  from  P2,  and  then  to  check  whether  the  result 
fulfills  (X\\sync)-P2. 

To  perform  the  quotient  as  suggested,  P2  should  be  extended  to  contain  all 
actions  of  sync,  so  that  the  orders  of  actions  are  compatible  in  sync  and  in  the 
extended  version  of  P2.  (This  procedure  is  explained  in  [101.)  The  extension 
of  P2  gives 

*[[  diAbi  —  uV,  [u];  hot;  hW];  bo i;  do];  hd/];  u|;  h«l;  do\ 

0  di  a  -ibi  -*  v|;  [v];  do];  hd/J;  vi;  hv];  do] 

]]■ 


We  obtain  forX 

*[[u  —  bo];  h bi];  bol;  do];  [m];  dol 
D  v  — dot;  hv];  dol 
11  - 


The  compilation  of  the  first  guarded  command  is  facilitated  if  transition  bol  is 
postponed  until  after  hu].  This  transformation  does  not  introduce  deadlock 
since  the  completion  of  D  does  not  depend  on  the  completion  of  B.  After  this 
transformation,  the  PR  expansion  gives 

u  •-*  bo]  ->u  *-»  bol 

uA-ybi  t-*  do]  v  ►-«  do] 

bi  vi«  *—  dol  nv  >-*  do]  ■ 

The  operator  reduction,  which  includes  the  introduction  of  auxiliary  variables 
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do'  and  do",  gives 

u  w  bo 
{ u ,  -ib/)  a  do1 
v  it;  do" 

(do1,  do”)  y  do . 

The  circuit  is  shown  in  Figure  23.  The  implementation  of  P3  is  identical. 

22.3  The  Circuit 

The  final  circuit,  shown  in  Figure  24,  is  obtained  by  composing  the  two  iden¬ 
tical  circuits  implementing  P2  and  P3  with  the  circuit  of  PI.  The  reshuffled 
version  of  PI,  consisting  of  a  wire  and  an  inverter,  can  also  be  used  if  it  can  be 
proved  that  the  reshuffling  does  not  introduce  deadlock.  The  circuit  shown  in 
Figure  24  includes  a  minor  optimization  that  eliminates  the  negated  inputs 
that  are  also  the  output  of  a  fork. 

23  Conclusion 

We  have  described  a  method  for  implementing  a  concurrent  program  (a 
set  of  communicating  processes)  as  a  network  of  digital  operators  that  can 
be  directly  mapped  into  a  delay -insensitive  VLSI  circuit.  The  circuit  is  derived 
from  the  program  by  applying  a  series  of  systematic,  semantics-preserving 


Figure  23.  Implementation  of  P 2. 
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transformations  that  we  have  compared  to  compiling.  Hence,  the  circuits  are 
correct  by  construction,  and  their  logical  correctness  is  independent  of  the 
delays  in  operators  and  wires,  with  the  exception  of  isochronic  forks. 

The  examples  cover  most  of  the  constructs  of  the  language  but  not  all 
of  them:  We  have  not  shown  how  to  implement  an  arbitrary  set  of  guards. 
Therefore,  we  have  not  quite  shown  that  any  program  in  the  language  can  be 
compiled.  Such  a  proof  has  been  given  in  [1)  and  [2],  where  the  compilation 
of  each  construct  is  described  as  part  of  the  basic  algorithm  for  an  automatic 
compiler.  It  is  shown  that  any  program  in  a  subset  of  the  language  can  be 
implemented  as  a  delay-insensitive  circuit  using  only  a  small  set  of  basic 
elements:  the  two-input  C-element,  the  two-input  or-gate  or  two-input  and- 
gate,  the  synchronizer,  the  inverter,  and  the  isochronic  fork. 

There  is  no  reason,  however,  for  confining  the  designer  to  a  minimal  set 
of  operators.  On  the  contrary,  since  an  advantage  of  VLSI  is  the  possibility 
to  create  operators  at  no  cost,  introducing  the  special-purpose  operator  that 
exactly  implements  an  arbitrary  set  of  production  rules  often  simplifies  a 
circuit  drastically. 

In  order  to  convince  the  VLSI  community  of  the  practicality  of  our  method, 
it  was  essential  to  fabricate  the  circuits  we  had  designed.  Hence,  all  significant 


Figure  24.  Implementation  of  the  fair  arbiter. 
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examples  that  we  have  used  in  our  research  — distributed  mutual  exclusion, 
queues,  stacks,  routing  automata  for  a  communication  network,  the  3X  +  1 
engine—  have  been  fabricated  in  SCMOS  using  the  MOSIS  foundry  service. 
They  have  all  be  found  to  be  correct  on  “first  silicon".  They  are  also  very 
robust  and  —given  the  low  level  of  circuit  optimization  applied —  surpris¬ 
ingly  fast.  The  3x  +  1  engine,  constructed  by  Tony  Lee,  is  a  special-purpose 
processor  consisting  of  a  state-machine  and  an  80-bit-wide  datapath.  It  con¬ 
tains  approximately  40,000  transistors  and  operates  at  over  8  MIPS  (million 
instructions  per  second)  in  2pm  MOSIS  SCMOS  technology. 

At  the  moment  of  writing,  we  have  just  completed  the  design  of  the  first 
asynchronous  general-purpose  microprocessor  [12].  It  is  a  16-bit  RISC-like 
architecture  with  independent  instruction  and  data  memories.  It  has  16  reg¬ 
isters,  four  buses,  an  ALU,  and  two  adders.  The  size  is  about  20,000  transis¬ 
tors.  Two  versions  have  been  fabricated:  one  in  2pm  MOSIS  SCMOS,  and  one 
in  1.6pm  MOSIS  SCMOS.  (On  the  2pm  version,  only  12  registers  were  imple¬ 
mented  in  order  to  fit  the  chip  on  an  84-pin  6600pm  x  4600pm  package.) 

The  chips  are  entirely  delay-insensitive,  with  the  sole  exception  of  the  in¬ 
terface  with  the  memories  and,  of  course,  the  isochronic  forks.  In  the  absence 
of  available  memories  with  asynchronous  interfaces,  we  have  simulated  the 
completion  signal  from  the  memories  with  an  external  — off-chip —  delay.  For 
testing  purposes,  the  delay  on  the  instruction  memory  interface  is  variable. 

In  spite  of  the  presence  of  floating  n-wells,  the  2pm  version  runs  at  12  MIPS. 
The  1.6pm  version  runs  at  18  MIPS.  (Those  performance  figures  are  based  on 
measurements  from  sequences  of  ALU  instructions  without  carry.  They  take 
no  advantage  of  the  overlap  between  ALU  and  memory  instructions.)  Those 
performances  are  quite  encouraging  given  that  the  design  is  very  conserva¬ 
tive:  no  pass-transistors,  static  gates,  dual-rail  encoding  of  data,  completion 
trees,  etc. 

Only  2  of  the  12  2pm  chips  passed  all  tests,  but  34  of  the  50  1.6pm  chips 
were  found  to  be  entirely  functional. 

We  have  tested  the  chips  under  a  wide  range  of  VDD  voltage  values.  At 
room  temperature,  the  2pm  version  is  functional  in  a  voltage  range  from  7V 
down  to  IV!  It  reaches  15  MIPS  at  7V.  We  have  also  tested  the  chips  cooled  in 
liquid  nitrogen.  The  2pm  version  reaches  20  MIPS  at  5V  and  30  MIPS  at  12V. 
The  1.6pm  version  reaches  30  MIPS  at  5V.  Of  course,  these  measurements  are 
made  without  adjusting  any  clocks  (there  are  none),  but  simply  by  connecting 
the  processor  to  a  memory  containing  a  test  program  and  observing  the  rate 
of  instruction  execution.  The  power  consumption  is  145mW  at  5  V,  and  6.7mW 
at  2V. 
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